Nvidia’s banking on TensorRT to expand its generative AI dominance

Nvidia is adding support for its TensorRT-LLM SDK to Windows and models like Stable Diffusion, aiming to accelerate large language models (LLMs) and related tools. The software breaks down LLMs and AI models to run faster on Nvidia’s H100 GPUs. The company plans to make TensorRT-LLM publicly available, allowing models to run and work faster and making generative AI more cost-efficient.

Despite Nvidia's near monopoly on powerful chips that train LLMs and the high demand for its H100 GPUs, companies like Microsoft and AMD are developing their own chips to reduce reliance on Nvidia. AMD also plans to acquire software company Nod.ai to help LLMs run on its chips. Nvidia remains the hardware leader in generative AI, but it is positioning itself for a future where people don't have to depend on buying large numbers of its GPUs.

Key takeaways:

Nvidia is adding support for its TensorRT-LLM SDK to Windows and models like Stable Diffusion, aiming to make large language models (LLMs) and related tools run faster.
TensorRT-LLM breaks down LLMs like Meta’s Llama 2 and other AI models to let them run faster on Nvidia’s H100 GPUs, improving the experience for more sophisticated LLM use.
Companies like Microsoft and AMD are developing their own chips to reduce reliance on Nvidia, and AMD plans to buy software company Nod.ai to help LLMs run on AMD chips.
Nvidia remains the hardware leader in generative AI, but it is positioning itself for a future where people don't have to depend on buying large numbers of its GPUs.

Nvidia’s banking on TensorRT to expand its generative AI dominance

Key takeaways:

Comments (0)

Newsletter