<ul><li>BiomedSQL is introduced as a benchmark for evaluating scientific reasoning in text-to-SQL generation over biomedical knowledge bases.</li><li>It consists of 68,000 question/SQL query/answer triples grounded in a harmonized BigQuery knowledge base, integrating gene-disease associations, causal inference, and drug approval records.</li><li>Models need to infer domain-specific criteria rather than rely solely on syntactic translation, such as genome-wide significance thresholds and trial phase filtering.</li><li>Performance evaluation shows a significant performance gap among different language models, with the best custom agent achieving 62.6% execution accuracy compared to the expert baseline of 90.0%.</li></ul>

BiomedSQL: Text-to-SQL for Scientific Reasoning on Biomedical Knowledge Bases

Discover more