The Trainium2 chip is designed to target complex GenAI LLM inferencing and training workloads with a 500W chip that has 650 TFLOP/s per chip of a dense BF16 performance with 96GByte of HBM3e memory capacity. The chip also features a scale-up network called NeuronLink. The Trainium2 server architecture consists of two SKUs, the Trainium2 (Trn2) and Trainium2-Ultra (Trn2-Ultra), with the latter being the primary SKU for GenAI frontier model training and inference for internal Amazon workloads and for Anthropic’s workloads.
Key takeaways:
- Amazon is investing heavily in AI clusters, deploying a large number of Hopper and Blackwell GPUs and investing billions in Trainium2 AI clusters.
- AWS is currently deploying a cluster with 400k Trainium2 chips for Anthropic, known as “Project Rainier”. This is one of the largest AI cluster deployments globally.
- Despite these investments, Amazon's Trainium1 and Inferentia2 based instances have not been competitive for GenAI frontier model training or inference due to weak hardware specs and poor software integration.
- With the release of Trainium2, Amazon has made a significant course correction and is on a path to eventually providing a competitive custom silicon in training and inference on the chip, system, and the software compiler/frameworks level.