Google's sixth-generation Tensor Processing Unit (TPU), Trillium, can deliver up to 1.8x better performance-per-dollar compared to prior-generation Cloud TPU v5p.
MLPerf 4.1 training benchmarks showed that the Trillium delivers a 99% scaling efficiency (throughput).
The metrics used for hardware accelerator comparison include peak throughput, effective throughput, throughput scaling efficiency, utilization performance, scaling efficiency and convergence scaling efficiency.
Convergence scaling efficiency, which focuses on the fundamental goal of training, is measured by the ratio of the speedup in convergence time to the increase in cluster size.
The convergence scaling efficiency of Trillium is close to that of Cloud TPU v5p which is commendable, but Trillium delivers the convergence at a lower cost.
Trillium achieves 99% scaling efficiency even when operating across data-center networks using Cloud TPU multislice technology, outperforming the 94% scaling efficiency of Cloud TPU v5p cluster within a single ICI domain.
Trillium provides better performance per dollar, improvement in convergence scaling efficiency and scaling properties, which make it the most cost-efficient TPU training system to date.
Applying multiple dimensions of performance and efficiency, like effective model FLOPS utilization (EMFU), memory bandwidth utilization (MBU), regions using ICI domains and scaling characteristics, for ML-accelerator evaluation is suggested to make data-driven decisions based on workload requirements.
Trillium has been launched to address the demands of next-generation models by providing performance at scale, from the chip to the system to Google data center deployments.
Throughput scaling efficiency and metrics like EMFU and MBU provide more meaningful insights into an accelerator's abilities beyond simple metrics like peak performance.