The article also provides updates on the project, including the addition of support for directly loading models from ModelScope, initial INT2 support, and the ability to use `ipex-llm` through Text-Generation-WebUI GUI. It also mentions the support for Self-Speculative Decoding, which brings a speedup for FP16 and BF16 inference latency on Intel GPU and CPU. The article concludes with a list of verified models, installation and running guides, and code examples for `ipex-llm`.
Key takeaways:
- The project previously known as 'bigdl-llm' has been renamed to 'ipex-llm'.
- 'IPEX-LLM' is a PyTorch library for running LLM on Intel CPU and GPU with very low latency.
- It provides seamless integration with various other projects and has optimized/verified over 50 models on 'ipex-llm'.
- The latest updates include support for directly loading models from ModelScope, initial INT2 support, and the ability to use 'ipex-llm' through Text-Generation-WebUI GUI.