Enterprise data, spanning diverse domains, often maintained across disparate environments, poses challenges for natural language to SQL (NL2SQL) technology due to complex schemas with nested tables and multi-dimensional data.
Recent advances in generative AI have enabled NL2SQL technology using large language models (LLMs), but accuracy and scalability remain challenges for enterprise data.
Challenges include complex schemas optimized for storage, diverse and complex natural language queries, LLM knowledge gap, attention burden, and fine-tuning challenges.
A solution methodology has been developed by AWS and Cisco teams that focuses on narrowing the generative focus to the appropriate data domain, using data abstractions, and optimizing SQL generation steps.
The methodology involves mapping user queries to data domains, scoping data domains for prompt construction, augmenting SQL DDL definitions, determining query dialect, and managing identifiers for SQL generation.
Handling complex data structures involves abstracting domain data structures into simplified forms for better understanding by the LLMs.
The solution provided high accuracy, consistency, low cost, low latency, and scalability in SQL generation for enterprise data, achieved through the systematic approach outlined in the methodology.
The solution's architecture on AWS involves processing steps using Amazon API Gateway, AWS Lambda, and Amazon Bedrock to process natural language queries into SQL results.
In conclusion, the methodology offers a methodical approach to enterprise-grade SQL generation, reducing complexity, ensuring accuracy, and improving overall performance.
Authors include professionals from Cisco and AWS with extensive experience in AI/ML, cloud migration, computer science, engineering, and security domains.
The solution methodology can be adapted to various business applications, with a demo code available in the GitHub repository, inviting feedback and questions.