Evaluation is crucial for NLQ-to-SQL systems to ensure the generated SQL queries reflect user intent and retrieve accurate data.Measurable metrics are essential to assess NLQ-to-SQL system performance beyond human validation.An NLQ-to-SQL pipeline powered by LLMs aims to bridge the gap between natural language and structured data.Metrics like Entity Recognition Score help evaluate correctness and efficiency of generated SQL queries.Semantic Equivalence Score ensures functionally correct queries are generated by NLQ-to-SQL models.Halstead Complexity Score measures the complexity of generated SQL queries to improve model performance.SQL Injection Detection is vital for identifying and preventing malicious patterns in queries.Data Retrieval Accuracy assesses how well generated SQL queries retrieve data compared to ground truth queries.Monitoring resource utilization during SQL execution helps optimize query performance and database efficiency.Evaluation and governance play a crucial role in developing a meaningful NLQ-to-SQL system aligned with user intent.