GitHub - AdrianKrebs/datalens

Datalens is a personal experiment that uses language models to rank unstructured job data based on user-defined criteria. Unlike traditional job search platforms that rely on rigid filtering systems, Datalens allows users to define their preferences in a more natural way and rates each job posting based on relevance. The system also allows for weighting of criteria, with "must criteria" being weighted twice as much as normal ones. Users can add any job data source they prefer, and the system comes pre-configured with the most recent "Who's Hiring" thread from Hacker News.

The system uses tools like Kadoa to fetch job data from company pages, but it can also work with other traditional scraping methods. The relevance scoring works best with certain models, and users can switch between models depending on their needs. However, users are warned that running the script continuously can result in high API usage. To run the app, users need an OpenAI API or Anthropic Claude key, Python 3.7 or higher and pipenv for the Flask server, and Node.js and npm for the Next.js client. The system currently focuses only on job data, but there are plans to extend it to other types of data like events and products.

Key takeaways:

Datalens is a personal experiment that uses LLMs to rank unstructured job data based on user-defined criteria, providing a more flexible and personalized job search experience.
The system allows users to add any job data source they like, and provides examples of how to scrape career pages for job data.
The relevance scoring works best with certain models, but the default model is chosen for cost reasons. Users are warned that running the script continuously can result in high API usage.
Improvements to the system could include streaming for faster results, extensibility to other types of data, switching to SQLite for storage, and fine-tuning models for better and cheaper results.

GitHub - AdrianKrebs/datalens

Key takeaways:

Comments (0)

Newsletter