Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

GitHub - Tsadoq/ErisForge: Dead Simple LLM Abliteration

Jan 27, 2025 - github.com
ErisForge is a Python library designed to modify Large Language Models (LLMs) by transforming their internal layers, allowing users to create both ablated and augmented versions of these models. It offers features such as the ability to alter model behaviors using `AblationDecoderLayer` and `AdditionDecoderLayer` classes, and measure refusal expressions in responses with the `ExpressionRefusalScorer`. The library supports custom behavior directions for specific transformations, providing a controlled way to adjust how models respond to various inputs.

To use ErisForge, users can clone the repository and install the necessary packages or install it directly from pip. The library allows for the transformation of model layers to induce different response behaviors, with examples provided for applying ablation and measuring refusal expressions. Users can save their modified models locally or push them to the HuggingFace Hub. The project is open for contributions and is licensed under the MIT License, with a disclaimer that it is intended for research and development purposes only.

Key takeaways:

  • ErisForge is a Python library designed to modify Large Language Models (LLMs) by applying transformations to their internal layers.
  • It allows for the creation of both ablated and augmented versions of LLMs that respond differently to specific types of input.
  • The library includes features such as the `AblationDecoderLayer`, `AdditionDecoderLayer`, and `ExpressionRefusalScorer` for altering model behavior and measuring refusal expressions.
  • ErisForge can be installed by cloning the repository or directly from pip, and it supports saving modified models locally or pushing them to the HuggingFace Hub.
View Full Article

Comments (0)

Be the first to comment!