In a nondescript office building nestled in Austin, Texas, a small team of Amazon employees is fervently working on designing two distinct microchips tailored for the training and acceleration of generative AI. These custom AWS chips, named Inferentia and Trainium, present a viable alternative for AWS customers seeking to train their massive language models without relying on increasingly scarce and costly Nvidia GPUs.
Amazon Web Services (AWS) CEO Adam Selipsky noted in a June interview that the demand for chips to power generative AI is soaring across the globe. Amazon’s proactive approach aims to meet this demand, stating that the company is well poised to provide the necessary capacity for the collective requirements of its customers.
Playing catch-up in a booming industry
However, other players have surged ahead by investing substantially to harness the generative AI boom. When OpenAI introduced its ChatGPT in November, Microsoft attracted considerable attention by hosting this viral chatbot and committing an estimated $13 billion to OpenAI. Microsoft swiftly integrated generative AI models into its products, incorporating them into Bing by February. In the same month, Google launched Bard, its large language model, and invested a hefty $300 million in OpenAI competitor Anthropic.
Amazon took a bit longer to enter the scene. It wasn’t until April that Amazon unveiled its line of large language models, branded Titan, along with the Bedrock service aimed at aiding developers in enhancing software with generative AI capabilities. This belated entry has led some experts to suggest that Amazon, usually known for pioneering markets, now finds itself playing catch-up in the generative AI field.
Technical differentiation with custom Silicon
Chirag Dekate, VP Analyst at Gartner, believes that Amazon’s custom silicon could eventually provide a competitive edge in the generative AI sector. The distinct technical capabilities offered by these custom chips, Inferentia and Trainium, might set Amazon apart. Notably, Microsoft lacks equivalents to Trainium and Inferentia.
Amazon embarked on its journey with custom silicon in 2013, starting with specialized hardware named Nitro. Over time, Nitro has become the highest-volume AWS chip, with over 20 million units in use, adorning every AWS server.
In 2015, Amazon acquired Israeli chip startup Annapurna Labs. In 2018, the company rolled out its Arm-based server chip, Graviton, to rival dominant x86 CPUs from industry giants like AMD and Intel. In the same year, Amazon ventured into AI-focused chips. This was a couple of years after Google introduced its first Tensor Processor Unit (TPU). Meanwhile, Microsoft has remained tight-lipped about its Athena AI chip, developed in collaboration with AMD.
Custom chips and their roles
A behind-the-scenes tour of Amazon’s chip lab in Austin reveals the development and testing of Trainium and Inferentia. Matt Wood, VP of Product, explains their roles. Machine learning comprises two stages: training the models and running inferences against the trained models. Trainium, introduced in 2021, offers a substantial 50% improvement in price performance compared to other methods of training machine learning models on AWS. Inferentia, released in 2019 and now in its second generation, excels at processing high-throughput, low-latency machine learning inferences.
Despite Amazon’s efforts, Nvidia’s GPUs currently dominate the training of models. In July, AWS introduced AI acceleration hardware powered by Nvidia H100s, highlighting Nvidia’s strong position in the AI landscape.
Leveraging cloud dominance for advantage
Amazon’s dominance in the cloud arena sets it apart. With a massive cloud install base, Amazon’s strength lies in enabling existing customers to expand into value creation using generative AI. Millions of AWS customers, already familiar with the platform, may find it enticing to utilize Amazon’s services for generative AI tasks, building upon their existing applications and data storage.
AWS holds the lion’s share of the cloud computing market, commanding 40% as of 2022. Despite recent dips in operating income, AWS still contributes a significant 70% to Amazon’s overall operating profit. Its wide operating margins outshine Google Cloud’s.
AWS’s expanding generative AI toolset
AWS focuses on a diverse toolset for generative AI rather than directly challenging ChatGPT. AWS HealthScribe, unveiled in July, employs generative AI to help doctors draft patient visit summaries. SageMaker, a machine learning hub, offers algorithms and models. CodeWhisperer, a coding companion, has boosted developer productivity by 57% on average. Last year, Microsoft reported similar productivity enhancements from GitHub Copilot.
In June, AWS announced a $100 million generative AI innovation center. The company aims to guide customers in understanding generative AI’s potential within their business contexts by offering personalized assistance from experts.
Bridging security concerns
The proliferation of AI also raises security concerns, particularly surrounding proprietary information in training data used for public language models. AWS addresses these concerns by ensuring that generative AI models accessed through its Bedrock service operate within isolated virtual private cloud environments. This setup guarantees encryption and consistent AWS access controls, alleviating worries about data leakage.
Amazon’s ongoing push in generative AI
Amazon continues to push forward in the generative AI realm. While over 100,000 customers are currently using machine learning on AWS, this constitutes only a fraction of AWS’s vast customer base. Nonetheless, analysts believe that Amazon’s prominence and reputation as a reliable provider could sway enterprises to explore Amazon’s generative AI ecosystem more extensively.
Chirag Dekate emphasizes that enterprises are unlikely to switch their infrastructure strategies solely based on a competitor’s advancements in generative AI. For existing Amazon customers, the allure of Amazon’s extensive ecosystem might play a pivotal role in their decisions.
From Zero to Web3 Pro: Your 90-Day Career Launch Plan