Nvidia's GPUs have once again outperformed the competition in the latest MLPerf benchmark results, particularly excelling in pretraining the Llama 3.1 403B large language model.
The MLPerf competition, aimed at bringing order to AI performance chaos, includes six industry-relevant machine learning benchmarks ranging from content recommendation to graph node classification.
The large language model pretraining task, the most resource-intensive, was updated with Meta's Llama 3.1 403B, larger than GPT3, reflecting the trend of larger models in the industry.
Nvidia's Blackwell GPUs topped all benchmarks with the fastest training times, and AMD's latest GPU, MI325X, performed similarly to Nvidia's H200 on the LLM fine-tuning benchmark.
Network efficiency played a crucial role in training large models, with Nvidia submitting a system with 512 B200 GPUs for the LLM fine-tuning benchmark.
Nvidia's NVL72 package connecting Grace CPUs and Blackwell GPUs contributed to efficient scaling, achieving close to linear performance scaling with more GPUs on the pretraining benchmark.
Lenovo was the only submitter to include a power measurement in its submission to the MLPerf benchmark, highlighting the importance of power efficiency in AI training tasks.
As concerns about AI's energy use grow, more companies are urged to disclose power measurement results in future MLPerf rounds for comparison and improvement purposes.