AWS Unveils Advanced AI Hardware with Trainium2 and Upcoming Trainium3 Chips

Machine learning was a major focus at the annual re:Invent conference by Amazon Web Services (AWS), even when it came to new hardware. The spotlight was on EC2 UltraServers powered by Trainium2 chips and the newly available EC2 instances. According to AWS, the new Trn2 instances offer 20.8 petaflops of computing power per instance and up to 40% better price-performance compared to GPU-based EC2-P5 instances.

A Trn2 UltraServer consists of four Trn2 instances connected by a NeuronLink connection. This architecture is designed to scale computing power up to 83.2 petaflops, reducing training and inference times for the world’s largest AI models. Models with up to one trillion parameters could be processed with improved latency.

AWS also announced “Project Rainier,” which combines hundreds of Trainium2 UltraServers into an EC2 UltraCluster, allowing for an increase in cluster size compared to existing solutions. These UltraClusters are used by organizations like Anthropic for training AI models. Anthropic uses them to optimize Claude models for Amazon Bedrock on Trainium2. This infrastructure aims to enable customers to efficiently train and use models with trillions of parameters in real-time.

AWS emphasized that simply increasing the size of clusters is not enough to improve performance. Instead, the new architecture of Trainium2 UltraServers enhances data distribution and resource allocation. This reduces overall training time without encountering traditional network limits.

In addition to the Trainium2 solutions, AWS introduced EC2 P6 instances, which are based on the next generation of Nvidia’s Blackwell GPUs. Compared to the current generation, AWS promises up to 2.5 times higher performance and optimization specifically for compute-intensive generative AI applications. AWS envisions the P6 instances for applications that require fast response times and high scalability.

AWS also announced the upcoming Trainium3 chip as the successor to Trainium2. It will be manufactured with a 3-nanometer process and is expected to be more energy-efficient and four times more powerful than its predecessor. Customers could iterate models faster and deploy them in real-time. Trainium3 is expected to be available in later versions of the UltraServer.

Related