The supercomputer features a board with four TPUv4 chips and is liquid-cooled, leading to significant power savings. Google's system uses PCIe Gen3 x16 back to the host and has a number of interconnects. Each system comprises 64 racks with 4096 interconnected chips. The system uses an Optically Reconfigurable Interconnect, which allows for direct connection between chips and efficient data sharing. Google can adjust the optical routing to change topologies based on model needs. The company also increased the on-chip memory to 128MB to keep data access local. Google's supercomputer outperforms the NVIDIA A100 on a performance-per-watt basis.
Key takeaways:
- Google has been using an optically reconfigurable AI network for its AI training cluster, which allows for better performance, lower power, and more flexibility.
- The Google TPUv4, a 7nm chip, has been revealed with an overprovisioned power to fulfill a 5ms service time SLA and is designed to scale out and run as part of large-scale infrastructure.
- Google's supercomputer uses optical circuit switching (OCS) for direct connection between chips, which allows for efficient data sharing and higher utilization of the nodes.
- Google is expected to start talking about the TPUv5 soon, and it is speculated that Google will push more into the AI space and compete against NVIDIA in AI hardware and cloud services.