1
Feature Story
Nvidia launches a set of microservices for optimized inferencing | TechCrunch
Mar 18, 2024 · techcrunch.com
Nvidia is collaborating with Amazon, Google, and Microsoft to make these NIM microservices available on SageMaker, Kubernetes Engine, and Azure AI, respectively. The microservices will also be integrated into frameworks like Deepset, LangChain, and LlamaIndex. Nvidia will use the Triton Inference Server, TensorRT, and TensorRT-LLM for the inference engine. The company plans to add more capabilities over time, including making the Nvidia RAG LLM operator available as a NIM. Current users of NIM include Box, Cloudera, Cohesity, Datastax, Dropbox, and NetApp.
Key takeaways
- Nvidia has announced a new software platform, Nvidia NIM, designed to streamline the deployment of AI models into production environments.
- NIM combines a given model with an optimized inferencing engine and packs it into a container, making it accessible as a microservice.
- NIM currently supports models from NVIDIA, A121, Adept, Cohere, Getty Images, Shutterstock, Google, Hugging Face, Meta, Microsoft, Mistral AI and Stability AI.
- Among NIM’s current users are Box, Cloudera, Cohesity, Datastax, Dropbox and NetApp.