Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

GitHub - itsliamdowd/Redact: Leverage the benefits of large language models without leaking sensitive information.

Jan 04, 2024 - github.com
Redact is a secure document processing tool that uses advanced language models to interact with PDF files, ensuring the removal of sensitive data and replacing it with generic information. This allows for the utilization of large language models without compromising sensitive information. The redacted files can then be processed by large language models through API calls. The tool uses the Mixtral-8x7B model for quick and accurate results.

The process involves uploading a file, extracting text and replacing sensitive information with generic values, storing these values in a key-value pair, splitting the file into chunks, and loading the large language model over an API call. The redacted information is then swapped out for the sensitive information when the response is returned to the user. Future plans for Redact include recognizing and redacting more types of sensitive information, adding different model selections, allowing custom data to be redacted per document, and improving file processing speed and ease of use.

Key takeaways:

  • Redact is a tool that allows for secure document processing with large language models, ensuring the removal of sensitive data and its replacement with generic information.
  • The tool uses the Mixtral-8x7B model for quick and accurate results and works by extracting text from a file, replacing sensitive information with generic values, and storing these values in a key value pair.
  • Redact can be installed by cloning the project, creating a new virtual environment, installing necessary dependencies, creating a .env file with your Hugging Face access token, running app.py, and navigating to http://127.0.0.1:5000.
  • Future plans for Redact include recognizing and redacting more types of sensitive information, adding selection for different models, allowing custom data to be redacted by the user per document, and improving file processing speed and ease of use.
View Full Article

Comments (0)

Be the first to comment!