Nvidia launches a set of microservices for optimized inferencing

Nvidia has unveiled Nvidia NIM, a software platform aimed at simplifying the deployment of custom and pre-trained AI models into production environments. The platform combines a model with an optimized inferencing engine, packaging it into a container that can be accessed as a microservice. Nvidia claims that this process could take developers weeks or even months, but with NIM, it aims to create an ecosystem of AI-ready containers that use its hardware as the foundational layer. The platform currently supports models from several companies, including NVIDIA, A121, Adept, Cohere, Getty Images, and Shutterstock, as well as open models from Google, Hugging Face, Meta, Microsoft, Mistral AI and Stability AI.

Nvidia is collaborating with Amazon, Google, and Microsoft to make these NIM microservices available on SageMaker, Kubernetes Engine, and Azure AI, respectively. The microservices will also be integrated into frameworks like Deepset, LangChain, and LlamaIndex. Nvidia will use the Triton Inference Server, TensorRT, and TensorRT-LLM for the inference engine. The company plans to add more capabilities over time, including making the Nvidia RAG LLM operator available as a NIM. Current users of NIM include Box, Cloudera, Cohesity, Datastax, Dropbox, and NetApp.

Key takeaways

Nvidia has announced a new software platform, Nvidia NIM, designed to streamline the deployment of AI models into production environments.
NIM combines a given model with an optimized inferencing engine and packs it into a container, making it accessible as a microservice.
NIM currently supports models from NVIDIA, A121, Adept, Cohere, Getty Images, Shutterstock, Google, Hugging Face, Meta, Microsoft, Mistral AI and Stability AI.
Among NIM’s current users are Box, Cloudera, Cohesity, Datastax, Dropbox and NetApp.

Nvidia launches a set of microservices for optimized inferencing | TechCrunch

Key takeaways

Discussion (0)