DataBridge leverages Language Models (LLMs) guided by user-defined rules for consistent document processing, eliminating the need for custom pipelines.
The rules system allows metadata extraction and content transformation, wherein rules are applied sequentially to document content.
MetadataExtractionRule extracts structured data from documents into searchable metadata based on defined schemas.
NaturalLanguageRule transforms document content as per natural language instructions like redaction or summarization.
DataBridge supports multiple LLM providers like OpenAI and Ollama, configured through the databridge.toml file.
Rules processing logic involves document parsing, validation, applying prompts, LLM interaction, and storing results based on rule types.
DataBridge chunks large documents for efficient processing and provides the option to adjust batch_size for performance optimization.
Effective rule creation involves specific prompts, rule sequencing, LLM selection based on task complexity, and engineering high-quality schemas.
DataBridge's rules-based ingestion system is versatile for various use cases such as resume processing, medical record management, and legal document analysis.
With a balance of simplicity on the client-side and power on the server-side, DataBridge offers flexibility, performance, and easy adaptation for diverse document processing needs.