Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

GitHub - mishushakov/llm-scraper: Turn any webpage into structured data using LLMs

Apr 20, 2024 - github.com
The article introduces the LLM Scraper, a TypeScript library that enables users to convert any webpages into structured data using LLMs. The library uses OpenAI chat models, schemas defined with Zod, and is based on the Playwright framework. It provides full type-safety with TypeScript and supports three operating modes: html, text, and image. It also allows for streaming when crawling multiple pages.

The article further provides a step-by-step guide on how to get started with the LLM Scraper. This includes installing the required dependencies from npm, setting up an OpenAI API key, and creating a new browser instance to attach the LLMScraper. An example is also provided to illustrate how to extract top stories from HackerNews using the LLM Scraper. The article concludes by inviting contributions from the community to the open-source project.

Key takeaways:

  • LLM Scraper is a TypeScript library that converts webpages into structured data using LLMs. It is based on the Playwright framework and supports three operating modes: html, text, and image.
  • The library offers full type-safety with TypeScript and uses OpenAI chat models. Schemas are defined with Zod.
  • Getting started with LLM Scraper involves installing required dependencies from npm, setting an OpenAI API key in your environment variables, and optionally creating a new browser instance and attaching LLMScraper to it.
  • The project is open-source and welcomes contributions from the community in the form of bug reports or improvements via issues or pull requests.
View Full Article

Comments (0)

Be the first to comment!