NVIDIA introduces Dynamo, an open-source inference library designed to accelerate and scale AI reasoning models efficiently and cost-effectively.
Dynamo incorporates technical innovations such as disaggregated serving, GPU resource planner, smart router, NIXL communication library, and KV cache manager.
Dynamo increases throughput and performance of inference models, enabling AI service providers to serve more requests per GPU, reduce response times, and lower operational costs.
The open-source nature of Dynamo empowers enterprises and researchers to optimize AI model serving across disaggregated environments, improving AI capabilities and meeting increasing demands.