Automatic Text Detection with Amazon Textract

Amazon S3 → AWS Lambda → Amazon Textract → Amazon DynamoDB

An event-driven workflow to automatically detect and store text found within pdf files by leveraging Amazon Textract, AWS Lambda, and Amazon DynamoDB.

This sample project demonstrates how to deliver an event-driven architecture to detect text within pdf files, while storing the results in Amazon DynamoDB.
Upon an object creation in the S3 bucket, a Lambda function is invoked, which initiates Amazon Textracts's DetectDocumentText function. Textract returns the results to the Lambda function which stores this information in the DynamoDB table.
This pattern deploys 1 S3 bucket, 1 Lambda Function, and 1 DynamoDB Table.

< Back to all patterns

GitHub icon Download this pattern (.zip)

GitHub icon View this pattern on GitHub


Clone repo

git clone https://github.com/aws-samples/serverless-patterns/cd https://github.com/aws-samples/serverless-patterns/main/textract-lambda-sam-python/template.yaml

Deploy

sam deploy


Testing

See the GitHub repo for detailed testing instructions.

Cleanup

Delete the stack: sam delete

Created by:

Jack Le Bon

Jack Le Bon

AWS Solutions Architect

Follow on LinkedIn