The global intelligent document processing (IDP) market size was valued at $1,285 million in 2022 and is projected to reach $7,874 million by 2028.
Anthropic’s Claude models, deployed on Amazon Bedrock, can help overcome language limitations of existing document extraction software.
Amazon Augmented AI (Amazon A2I) simplifies the creation of workflows for human review, managing the heavy lifting associated with developing these systems or overseeing a large reviewer workforce.
The article outlines a custom multilingual document extraction and content assessment framework using a combination of Anthropic’s Claude 3 on Amazon Bedrock and Amazon A2I to incorporate human-in-the-loop capabilities.
The framework can efficiently process multiple types of documents in various languages and extract relevant insights.
The solution relies on a multi-modal LLM to extract data from various multi-lingual documents and uses Rhubarb Python framework to extract JSON schema-based data from the documents.
The key steps of the framework include storing documents of different languages, invoking a processing flow to extract data from the document according to the given schema, passing extracted content to human reviewers for validation, and converting validated content into an Excel format for storage.
This comprehensive solution enables organizations to efficiently process documents in multiple languages and extract relevant insights, while benefiting from the combined power of AWS AI/ML services and human validation.
The article provides instructions on how to test the document processing pipeline and how to deploy it into the AWS Cloud and emphasizes to clean up the entire AWS CDK environment by using the cdk destroy command after use.
The authors are Partners and Senior Partners at Amazon Web Services, specializing in supporting partner solutions and strategic industry solutions on the AWS platform.