GitHub - MDK8888/SageMode at 0.1.0

SageMode is a Python library designed for deploying, scaling, and monitoring machine learning models, particularly LLMs, at scale. It is native to AWS and uses boto3 to interact with services such as EC2, S3, SageMaker, and Lambda. The library offers standardized and flexible deployments of Huggingface and PyTorch models on either SageMaker or EC2, supports custom pipelines for pre and post inference processing, and allows for LLM model deployment to AWS in just five lines of code. It also supports the chaining of PyTorch or Huggingface Models and wraps all inference endpoints around Lambda for built-in scalability and low cost.

The library is still in development, with plans to support high LLM inference speeds with quantization, GPT-Fast, and vllm, scale LLM deployments up and down quickly and cheaply, and provide high observability into LLMs in production with Datadog/Grafana and WhyLabs. To get started with SageMode, users need Python version 3.10.2, a virtual environment, and the library can be installed using pip. While documentation is still forthcoming, users can refer to the examples folder for guidance. The roadmap for SageMode includes deployment, scaling, monitoring, and operational improvements.

Key takeaways:

SageMode is a Python library that aids in deploying, scaling, and monitoring machine learning models, particularly LLMs, at scale on AWS.
The library provides standardized and flexible deployments of Huggingface and PyTorch models on SageMaker or EC2, supports custom pipelines for pre and post inference processing, and allows for LLM model deployment in as few as 5 lines of code.
Despite its current capabilities, SageMode is still under development with plans to support high LLM inference speeds with quantization, GPT-Fast, and vllm, scale LLM deployments up and down quickly and cost-effectively, and provide high observability into LLMs in production with Datadog/Grafana and WhyLabs.
The roadmap for SageMode includes improvements in deployment, scaling, monitoring, and operations, with specific plans for each category detailed in the markdown data.

GitHub - MDK8888/SageMode at 0.1.0

Key takeaways:

Comments (0)

Newsletter