This article discusses how to measure and improve accuracy for an SQL agent built using LLM model and SQL database. Starting with a prototype, the article explores methods to measure accuracy and improve it using self-reflection and retrieval-augmented generation (RAG) techniques.
The LLM model used in this project is Llama 3.1 8B from Meta, and the SQL database is ClickHouse. After building the prototype, the author creates a “golden” evaluation set of questions and correct answers to compare the model's output with them.
The author discusses the nuances of evaluating accuracy and scoring the generated results of queries. Then, the article explores self-reflection and RAG techniques to improve accuracy.
The article also discusses the usage of Chroma database as a local vector storage with OpenAI embeddings to find chunks that are similar to the query for RAG.
Finally, after combining self-reflection and RAG approaches, the author achieved 70% accuracy, which can be further improved using fine-tuning technique.