Apple has collaborated with NVIDIA to enhance the performance of large language models (LLMs) for AI applications.
Apple integrated its Recurrent Drafter (ReDrafter) technology into NVIDIA's TensorRT-LLM framework, resulting in a 2.7x speed increase in tokens generated per second.
The collaboration reduces user-perceived latency, decreases GPU usage, and reduces power consumption.
Developers can benefit from faster token generation on NVIDIA GPUs for production LLM applications.