Google has announced that Trillium is now generally available for Google Cloud customers. It has been designed to take on the infrastructural challenge presented by large-scale AI models capable of processing diverse modalities such as text and images. Trillium TPU, which is a key component of Google Cloud's AI Hypercomputer, provides powerful, efficient and sustainable infrastructure and has double the High Bandwidth Memory capacity of the previous-generation Cloud TPU v5e.
Trillium TPU provides users with 4x improvement in training performance, up to 3x increase in inference throughput, a 67% increase in energy efficiency, and a 4.7x increase in peak compute performance per chip. It offers up to 2.5x improvement in training performance per dollar and up to 1.4x improvement in inference performance per dollar.
This new hardware is capable of scaling a single distributed training job to hundreds of thousands of accelerators. Trillium TPUs exhibit significantly better scaling efficiency, enabling the training of dense language and multimodal models, such as Gemini 2.0, Google's new AI model.
In addition to the Trillium's absolute performance and scale, it is also designed to optimize performance per dollar, making it a cost-effective choice for businesses. Trillium provides enterprises and startups alike with the same powerful, efficient, and sustainable infrastructure.
With the addition of third-generation SparseCore, Trillium delivers a 2x improvement in the performance of embedding-intensive models and a 5x improvement in DLRM DCNv2 performance compared to the previous-generation Cloud TPU v5e.
Trillium's exceptional price-performance makes it a cost-effective choice for organizations seeking to maximize the value of their AI investments. It empowers researchers and developers to serve robust and efficient image models at significantly lower cost than before.
Trillium TPUs are capable of accelerating the training of massive language models including dense and Mixture of Experts (MoE) models. They also offer significant advancements for inference workloads, enabling faster and more efficient deployment of AI models than ever before.
The release of Trillium TPUs for Google Cloud customers marks a significant leap forward in Google Cloud’s AI infrastructure, delivering incredible performance, scalability, and efficiency for a wide range of AI workloads.
The AI Hypercomputer enables you to extract maximum value from an unprecedented deployment of over 100,000 Trillium chips per Jupiter network fabric with 13 Petabits/sec of bisectional bandwidth, capable of scaling to hundreds of thousands of chips using world-class co-designed software.
Trillium stands as a testament to Google Cloud's commitment to providing cutting-edge infrastructure that empowers businesses to unlock the full potential of AI.