Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

LLM-Powered OLAP: the Tencent Experience with Apache Doris - Apache Doris

Sep 11, 2023 - doris.apache.org
The article discusses the use of Large Language Models (LLM) in conjunction with Apache Doris for OLAP services in a data management system. The author explains how they overcame challenges such as LLM's lack of understanding of data jargons, slow inference, lack of niche knowledge, and the need for legal, political, financial, and regulatory information. Solutions included introducing a semantic layer to translate business terms into data fields, creating parsing rules to increase cost-effectiveness, adding a Schema Mapper to empower LLM with niche knowledge, and using plugins to connect LLM to more fields of information.

The author also introduces the SuperSonic framework, which uses a Schema Mapper, LLM, and a Semantic Layer to process and answer complex queries. They discuss the evolution of their OLAP architecture, including the decision to replace ClickHouse with Apache Doris and split flat tables into metric and dimension tables. The author also mentions other useful functionalities of Apache Doris, such as Materialized View, Flink-Doris-Connector, and Compaction. Future plans include testing the newly released Storage-Compute Separation and Cross-Cluster Replication of Doris.

Key takeaways:

  • The team replaced ClickHouse with Apache Doris as an OLAP engine for their data management system and used Large Language Models (LLM) to transform natural language questions into SQL statements, improving the ease of SQL writing.
  • They addressed several issues with the LLM, including its lack of understanding of data jargon, slow inference, lack of niche knowledge, and need for more diverse information, by introducing a semantic layer, creating LLM parsing rules, adding a Schema Mapper, and using plugins.
  • The team developed the SuperSonic framework, which uses a Schema Mapper, LLM, and a Semantic Layer to process and answer complex queries. They also optimized their OLAP architecture by streamlining links and splitting flat tables into metric and dimension tables.
  • Future plans include testing the newly released Storage-Compute Separation and Cross-Cluster Replication of Doris to reduce costs and increase service availability, and they are open to ideas and inputs about the SuperSonic framework and the Apache Doris project.
View Full Article

Comments (0)

Be the first to comment!