GitHub - data-prompt-query/dpq: dpq is an open-source python library that makes prompt-based data processing and feature engineering easy

The article introduces dpq, a Python library designed to simplify data processing and feature engineering using generative AI. The library allows users to apply prompts to items in list-like iterables, such as pandas series, and add new functions by defining them in a JSON file. It also includes a library of standard functions and is parallelized by default. dpq uses the 'requests' library to send OpenAI-style Chat Completions API requests and is compatible with GPT-3.5 Turbo.

The article also discusses the cost and speed of dpq, stating that it comes without cost or speed guarantees. However, it provides a rough estimate, stating that on a test data set of 1000 product reviews, the 'classify_sentiment.json' finishes in approximately 30 seconds on a standard Macbook and costs $0.05 using 'gpt-3.5-turbo'. The article concludes by discussing the potential of Language Model Libraries (LLMs) in text annotation and classification, citing recent studies that report better-than-human performance.

Key takeaways:

dpq is a Python library that simplifies data processing and feature engineering using generative AI.
It allows adding new functions by defining them in a JSON file and initializing the dpq agent with the respective custom_messages_path pointing to the folder.
dpq uses the requests library to send OpenAI-style Chat Completions API requests and is compatible with GPT-3.5 Turbo.
Recent studies have shown promising results using general-purpose LLMs for text annotation and classification, suggesting that LLMs can deliver consistent, high-quality output resulting in scalability, reduced time and costs.

GitHub - data-prompt-query/dpq: dpq is an open-source python library that makes prompt-based data processing and feature engineering easy

Key takeaways:

Comments (0)

Newsletter