Automatic data transformation with Amazon S3, AWS Glue and AWS Step Functions

Amazon S3 → AWS Step Functions → AWS Glue

This pattern helps transform data by using an event-driven architecture to trigger a data transformation job

When you upload a .csv file to the Input S3 bucket, it matches with an Amazon EventBridge rule that triggers a State Machine. The State Machine is composed of two steps:
1. Starts a AWS Glue Crawler to crawl the input bucket. This allows you to have the raw data (before transformation) inside the Data Catalog
2. Start a AWS Glue ETL Job, that runs a simple transformation job and drops empty columns in the dataset
The processed/transformed data is then added to the Data Catalog and to an S3 bucket to allow for further processing.

< Back to all patterns

GitHub icon Download this pattern (.zip)

GitHub icon View this pattern on GitHub


Clone repo

git clone https://github.com/aws-samples/serverless-patterns/cd serverless-patterns/s3-eventbridge-glue

Deploy

cdk deploy


Testing

See the GitHub repo for detailed testing instructions.

Cleanup

Delete the stack: cdk destroy.

Created by:

Elie Elmalem

Elie Elmalem

Solutions Architect @ AWS, with a particular interest in serverless services

Follow on LinkedIn