NVIDIA has released an AI Blueprint for video summarisation to enable companies to develop their own visual AI agents to analyse image and video data for various use cases.
NVIDIA's AI Blueprint harnesses Vision Language Models, which are a class of generative AI models that combine computer vision and language understanding.
The software enables customisation of virtual assistants across industries, and offers a full suite of optimised software for building and deploying generative AI-powered agents that can ingest and understand massive volumes of live video streams or data archives.
Accenture and Dell Technologies will be among companies using the NVIDIA AI Blueprint for visual search and summarisation to build virtual AI agents that can assist with productivity and safety in factories, warehouses, shops, airports, traffic intersections and other places.
The NVIDIA AI Blueprint for video search and summarisation can be configured with NVIDIA NIM microservices for VLMs and AI models.
Adopting NVIDIA’s blueprint could save developers months of investigation and optimisation of AI models for smart city applications.
Deployed on NVIDIA GPUs, the process of combing through video archives to identify key moments can be accelerated.
Visual AI agents could also be used to aid customers with visual impairments, generate recaps of sporting events or label visual datasets to train other AI models.
The NVIDIA AI Blueprint may also be used to identify traffic collisions or generate reports to aid emergency response efforts.
The latest blueprint joins a collection of NVIDIA AI Blueprints that make it easy to create AI-powered digital avatars, build virtual assistants for personalised customer service or extract enterprise insights from PDF data.