Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

GitHub - aryn-ai/sycamore: 🍁 Sycamore is an LLM-powered semantic data preparation system for building search applications.

Sep 30, 2023 - github.com
Sycamore is a semantic data preparation system designed to simplify the transformation and enrichment of unstructured data for search applications. It supports various unstructured document formats, including PDF and HTML, and uses LLM-enabled entity extraction to pull out semantically meaningful information from documents. Sycamore is built around a data structure called the `DocSet` that represents a collection of unstructured documents, and supports transforms for chunking, manipulating, and augmenting these documents. It also allows for easy embedding of data using popular embedding models and can scale processing workloads from a laptop to the cloud without changing application code.

The system is built on Ray, a distributed compute framework that can scale to hundreds of nodes. It also provides resources such as PyPi, Documentation, Slack, and Aryn Docs for setting up an end-to-end conversational search application with Sycamore and OpenSearch. Sycamore currently runs on Python 3.9+ for Linux and Mac OS. The platform also encourages contributions and provides a guide for setting up the environment for development.

Key takeaways:

  • Sycamore is a semantic data preparation system that simplifies the transformation and enrichment of unstructured data for search applications.
  • It supports a variety of unstructured document formats, has LLM-enabled entity extraction, and built-in data structures and transforms for processing large document collections.
  • Sycamore is built on Ray, a distributed compute framework that can scale to hundreds of nodes, allowing processing workloads to scale from a laptop to the cloud without changing application code.
  • It provides a simple script to read a collection of PDFs, partition them, compute vector embeddings, and load them into a local OpenSearch cluster.
View Full Article

Comments (0)

Be the first to comment!