Google Cloud unveiled its seventh-generation Tensor Processing Unit (TPU) called Ironwood, claiming it delivers more than 24 times the computing power of the world’s fastest supercomputer.
Ironwood is the first TPU purpose-built for inference, providing computational power for deploying trained AI models.
With scaled deployment, Ironwood delivers 42.5 exaflops, dwarfing El Capitan's 1.7 exaflops, and offers 4,614 teraflops peak compute per chip.
Each Ironwood chip features 192GB of High Bandwidth Memory (HBM) and 7.2 terabits per second memory bandwidth per chip.
Ironwood delivers twice the performance per watt compared to the previous TPU generation, enhancing power efficiency.
Google observed a 10x increase in AI compute demand over eight years, emphasizing the need for specialized architectures like Ironwood for inference.
Ironwood's focus on inference optimization signifies a shift towards deployment efficiency and reasoning capabilities in the AI landscape.
Google positions Ironwood as a foundation for advanced AI models like Gemini 2.5, emphasizing reasoning tasks over simple pattern recognition.
Google also announced Cloud WAN and software offerings for AI, aiming to enhance networking performance and machine learning runtime scaling.
Google's vertical integration in developing TPUs translates into competitive advantages in AI offerings for enterprise customers.