Listen to this story
|
Microsoft and NVIDIA entered a decade-long partnership earlier this year amid the generative AI craze. While the latter, with its hardware prowess, is already leading the race, Microsoft too enjoys an upper hand, thanks to its deal with OpenAI. All year round, both the parties have announced several deals hand-in-hand in the AI landscape.
“Our partnership with NVIDIA spans every layer of the Copilot stack — from silicon to software — as we innovate together for this new age of AI,” said Satya Nadella, chairman and CEO of Microsoft, at the ongoing Ignite conference.
Here are 7 NVIDIA announcements by Microsoft made at the event that caught our attention:
H100- and H200-based virtual machines come to Microsoft Azure
Microsoft has introduced the NC H100 v5 VM series for Azure, featuring the industry’s first cloud instances with NVIDIA H100 NVL GPUs. These virtual machines have the combined power of PCIe-based H100 GPUs connected via NVIDIA NVLink, delivering nearly 4 petaflops of AI computing and 188GB of HBM3 memory.
This setup is a game-changer for mid-range AI workloads, offering up to 12x higher performance on models like GPT-3 175B. Moreover, Microsoft plans to integrate the NVIDIA H200 Tensor Core GPU into Azure next year, catering to larger model inferencing with enhanced memory capacity and bandwidth using the latest-generation HBM3e memory.
Microsoft also has plans to add the NVIDIA H200 Tensor Core GPU to its Azure fleet next year to support larger model inferencing with similar latency.
Confidential Computing with NCC H100 v5 VMs
Microsoft is expanding its NVIDIA-powered services with the introduction of NCC H100 v5 VMs. These confidential virtual machines leverage NVIDIA H100 Tensor Core GPUs, ensuring the confidentiality and integrity of data and applications in use, in memory.
These GPU-enhanced confidential VMs will enter private preview soon, providing Azure customers with unparalleled acceleration while maintaining data security.
AI Foundry Service
NVIDIA has introduced an AI foundry service to supercharge the development and tuning of custom generative AI applications for enterprises and startups deploying on Microsoft Azure.
The foundry service pulls together three elements — a collection of NVIDIA AI Foundation Models, NVIDIA NeMoTM framework and tools, and NVIDIA DGXTM Cloud AI supercomputing services. This will give enterprises an end-to-end solution for creating custom generative AI models.
Businesses can then deploy their customised models with NVIDIA AI Enterprise software to power generative AI applications, including intelligent search, summarisation and content generation.
Partnership with Amdocs
NVIDIA has launched an AI foundry service to turbocharge the development and tuning of custom generative AI applications for enterprises and startups on Microsoft Azure. This introduction will optimise large language models for various industries.
The AI leader has also partnered with Amdocs, a key player in communications and media services that will leverage the AI foundry service to optimise enterprise-grade LLMs for the telco and media sectors. This collaboration builds on the existing Amdocs-Microsoft partnership.
AI Foundation Models More Accessible
Microsoft and NVIDIA are democratising access to AI Foundation Models, allowing developers to experience them through a user-friendly interface or API directly from a browser. These models, including popular ones like Llama 2, Stable Diffusion XL, and Mistral, can be customised with proprietary data.
Optimised with NVIDIA TensorRT-LLM these models deliver high throughput and low latency, running seamlessly on any NVIDIA GPU-accelerated stack. These foundational models are accessible through the NVIDIA NGC catalogue, Hugging Face, and Microsoft Azure AI model catalogue.
Omniverse Cloud’s Simulation Engines
NVIDIA also launched two new simulation engines on Omniverse Cloud hosted on Microsoft Azure: the virtual factory simulation engine and the autonomous vehicle (AV) simulation engine.
As automotive companies transition to AI-enhanced digital systems, these simulation engines aim to save costs and reduce lead times. Omniverse Cloud serves as a platform-as-a-service, unifying core product and business processes for automakers.
TensorRT-LLM Upgrade for Windows
An upcoming update to TensorRT-LLM, an open-source software enhancing AI inference performance, will add support for new large language models. This update makes demanding AI workloads more accessible on desktops and laptops with RTX GPUs, starting at 8GB of VRAM.
TensorRT-LLM for Windows will soon be compatible with OpenAI’s Chat API, letting developers run projects locally on a PC with RTX. The upcoming release of TensorRT-LLM v0.6.0 promises improved inference performance, up to 5x faster, and support for additional popular LLMs, including Mistral 7B and Nemotron-3 8B.