SQLBarber is a system leveraging Large Language Models (LLMs) to generate customized and realistic SQL workloads for database research and development.
It eliminates the need for manual crafting of SQL templates by providing a declarative interface and accepts natural language specifications to constrain SQL templates.
SQLBarber scales efficiently to generate large volumes of queries matching user-defined cost distributions and uses execution statistics from Amazon Redshift and Snowflake for real-world query characteristics.
The system introduces a self-correction module, a Bayesian Optimizer, and open-sourced benchmarks to generate customized SQL templates, reduce query generation time significantly, and improve alignment with target cost distributions.