At AWS’ annual re:Invent conference this week, CEO Adam Selipsky and other top executives announced new services and updates to attract burgeoning enterprise interest in generative AI systems and take on rivals including Microsoft, Oracle, Google, and IBM.
AWS, the largest cloud service provider in terms of market share, is looking to capitalize on growing interest in generative AI. Enterprises are expected to invest $16 billion globally on generative AI and related technologies in 2023, according to a report from market research firm IDC.
This spending, which includes generative AI software as well as related infrastructure hardware and IT and business services, is expected to reach $143 billion in 2027, with a compound annual growth rate (CAGR) of 73.3%.
This exponential growth, according to IDC, is almost 13 times greater than the CAGR for worldwide IT spending over the same period.
Like most of its rivals, particularly Oracle, Selipsky revealed that AWS’ generative strategy is divided into three tiers — the first, or infrastructure, layer for training or developing large language models (LLMs); a middle layer, which consists of foundation large language models required to build applications; and a third layer, which includes applications that use the other two layers.
AWS beefs up infrastructure for generative AI
The cloud services provider, which has been adding infrastructure capabilities and chips since the last year to support high-performance computing with enhanced energy efficiency, announced the latest iterations of its Graviton and the Trainium chips this week.
The Graviton4 processor, according to AWS, provides up to 30% better compute performance, 50% more cores, and 75% more memory bandwidth than the current generation Graviton3 processors.
Trainium2, on the other hand, is designed to deliver up to four times faster training than first-generation Trainium chips.
These chips will be able to be deployed in EC2 UltraClusters of up to 100,000 chips, making it possible to train foundation models (FMs) and LLMs in a fraction of the time than it has taken up to now, while improving energy efficiency up to two times more than the previous generation, the company said.
Rivals Microsoft, Oracle, Google, and IBM all have been making their own chips for high-performance computing, including generative AI workloads.
While Microsoft recently released its Maia AI Accelerator and Azure Cobalt CPUs for model training workloads, Oracle has partnered with Ampere to produce its own chips, such as the Oracle Ampere A1. Earlier, Oracle used Graviton chips for its AI infrastructure. Google’s cloud computing arm, Google Cloud, makes its own AI chips in the form of Tensor Processing Units (TPUs), and their latest chip is the TPUv5e, which can be combined using Multislice technology. IBM, via its research division, too, has been working on a chip, dubbed Northpole, that can efficiently support generative workloads.
At re:Invent, AWS also extended its partnership with Nvidia, including support for the DGX Cloud, a new GPU project named Ceiba, and new instances for supporting generative AI workloads.
AWS said that it will host Nvidia’s DGX Cloud cluster of GPUs, which can accelerate training of generative AI and LLMs that can reach beyond 1 trillion parameters. OpenAI, too, has used the DGX Cloud to train the LLM that underpins ChatGPT.
Earlier in February, Nvidia had said that it will make the DGX Cloud available through Oracle Cloud, Microsoft Azure, Google Cloud Platform, and other cloud providers. In March, Oracle announced support for the DGX Cloud, followed closely by Microsoft.
Officials at re:Invent also announced that new Amazon EC2 G6e instances featuring Nvidia L40S GPUs and G6 instances powered by L4 GPUs are in the works.
L4 GPUs are scaled back from the Hopper H100 but offer much more power efficiency. These new instances are aimed at startups, enterprises, and researchers looking to experiment with AI.
Nvidia also shared plans to integrate its NeMo Retriever microservice into AWS to help users with the development of generative AI tools like chatbots. NeMo Retriever is a generative AI microservice that enables enterprises to connect custom LLMs to enterprise data, so the company can generate proper AI responses based on their own data.
Further, AWS said that it will be the first cloud provider to bring Nvidia’s GH200 Grace Hopper Superchips to the cloud.
The Nvidia GH200 NVL32 multinode platform connects 32 Grace Hopper superchips through Nvidia’s NVLink and NVSwitch interconnects. The platform will be available on Amazon Elastic Compute Cloud (EC2) instances connected via Amazon’s network virtualization (AWS Nitro System), and hyperscale clustering (Amazon EC2 UltraClusters).
New foundation models to provide more options for application building
In order to provide choice of more foundation models and ease application building, AWS unveiled updates to existing foundation models inside its generative AI application-building service, Amazon Bedrock.
The updated models added to Bedrock include Anthropic’s Claude 2.1 and Meta Llama 2 70B, both of which have been made generally available. Amazon also has added its proprietary Titan Text Lite and Titan Text Express foundation models to Bedrock.
In addition, the cloud services provider has added a model in preview, Amazon Titan Image Generator, to the AI app-building service.
Foundation models that are currently available in Bedrock include large language models (LLMs) from the stables of AI21 Labs, Cohere Command, Meta, Anthropic, and Stability AI.
Rivals Microsoft, Oracle, Google, and IBM also offer various foundation models including proprietary and open-source models. While Microsoft offers Meta’s Llama 2 along with OpenAI’s GPT models, Google offers proprietary models such as PaLM 2, Codey, Imagen, and Chirp. Oracle, on the other hand, offers models from Cohere.
AWS also released a new feature within Bedrock, dubbed Model Evaluation, that allows enterprises to evaluate, compare, and select the best foundational model for their use case and business needs.
Although not entirely similar, Model Evaluation can be compared to Google Vertex AI’s Model Garden, which is a repository of foundation models from Google and its partners. Microsoft Azure’s OpenAI service, too, offers a capability to select large language models. LLMs can also be found inside the Azure Marketplace.
Amazon Bedrock, SageMaker get new features to ease application building
Both Amazon Bedrock and SageMaker have been updated by AWS to not only help train models but also speed up application development.
These updates includes features such as Retrieval Augmented Generation (RAG), capabilities to fine-tune LLMs, and the ability to pre-train Titan Text Lite and Titan Text Express models from within Bedrock. AWS also introduced SageMaker HyperPod and SageMaker Inference, which help in scaling LLMs and reducing cost of AI deployment respectively.
Google’s Vertex AI, IBM’s Watsonx.ai, Microsoft’s Azure OpenAI, and certain features of the Oracle generative AI service also provide similar features to Amazon Bedrock, especially allowing enterprises to fine-tune models and the RAG capability.
Further, Google’s Generative AI Studio, which is a low-code suite for tuning, deploying and monitoring foundation models, can be compared with AWS’ SageMaker Canvas, another low-code platform for business analysts, which has been updated this week to help generation of models.
Each of the cloud service providers, including AWS, also have software libraries and services such as Guardrails for Amazon Bedrock, to allow enterprises to be compliant with best practices around data and model training.
Amazon Q, AWS’ answer to Microsoft’s GPT-driven Copilot
On Tuesday, Selipsky premiered the star of the cloud giant’s re:Invent 2023 conference: Amazon Q, the company’s answer to Microsoft’s GPT-driven Copilot generative AI assistant.
Selipsky’s announcement of Q was reminiscent of Microsoft CEO Satya Nadella’s keynote at Ignite and Build, where he announced several integrations and flavors of Copilot across a wide range of proprietary products, including Office 365 and Dynamics 365.
Amazon Q can be used by enterprises across a variety of functions including developing applications, transforming code, generating business intelligence, acting as a generative AI assistant for business applications, and helping customer service agents via the Amazon Connect offering.
Rivals are not too far behind. In August, Google, too, added its generative AI-based assistant, Duet AI, to most of its cloud services including data analytics, databases, and infrastructure and application management.
Similarly, Oracle’s managed generative AI service also allows enterprises to integrate LLM-based generative AI interfaces in their applications via an API, the company said, adding that it would bring its own generative AI assistant to its cloud services and NetSuite.
Other generative AI-related updates at re:Invent include updated support for vector databases for Amazon Bedrock. These databases include Amazon Aurora and MongoDB. Other supported databases include Pinecone, Redis Enterprise Cloud, and Vector Engine for Amazon OpenSearch Serverless.