<ul><li>Apple has collaborated with NVIDIA to enhance the performance of large language models (LLMs) for AI applications.</li><li>Apple integrated its Recurrent Drafter (ReDrafter) technology into NVIDIA's TensorRT-LLM framework, resulting in a 2.7x speed increase in tokens generated per second.</li><li>The collaboration reduces user-perceived latency, decreases GPU usage, and reduces power consumption.</li><li>Developers can benefit from faster token generation on NVIDIA GPUs for production LLM applications.</li></ul>

Apple Teams Up With NVIDIA to Speed Up AI Language Models

Discover more