GitHub - intel-analytics/ipex-llm: Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with llama.cpp, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, ModelScope, etc.

The article announces that the `bigdl-llm` project has been renamed to `ipex-llm`, with a migration guide and link to the original `BigDL` project provided. `IPEX-LLM` is a PyTorch library designed for running LLM on Intel CPU and GPU with low latency. It is built on the Intel Extension for PyTorch (`IPEX`) and integrates with various other projects. It has optimized over 50 models, including LLaMA2, Mistral, Mixtral, Gemma, LLaVA, Whisper, ChatGLM, Baichuan, Qwen, RWKV, and more.

The article also provides updates on the project, including the addition of support for directly loading models from ModelScope, initial INT2 support, and the ability to use `ipex-llm` through Text-Generation-WebUI GUI. It also mentions the support for Self-Speculative Decoding, which brings a speedup for FP16 and BF16 inference latency on Intel GPU and CPU. The article concludes with a list of verified models, installation and running guides, and code examples for `ipex-llm`.

Key takeaways:

The project previously known as 'bigdl-llm' has been renamed to 'ipex-llm'.
'IPEX-LLM' is a PyTorch library for running LLM on Intel CPU and GPU with very low latency.
It provides seamless integration with various other projects and has optimized/verified over 50 models on 'ipex-llm'.
The latest updates include support for directly loading models from ModelScope, initial INT2 support, and the ability to use 'ipex-llm' through Text-Generation-WebUI GUI.

Key takeaways:

Comments (0)

Newsletter