Automatic data transformation with Amazon S3, AWS Glue and AWS Step Functions
Amazon S3 → AWS Step Functions → AWS Glue
This pattern helps transform data by using an event-driven architecture to trigger a data transformation job
When you upload a .csv file to the Input S3 bucket, it matches with an Amazon EventBridge rule that triggers a State Machine. The State Machine is composed of two steps:
1. Starts a AWS Glue Crawler to crawl the input bucket. This allows you to have the raw data (before transformation) inside the Data Catalog
2. Start a AWS Glue ETL Job, that runs a simple transformation job and drops empty columns in the dataset
The processed/transformed data is then added to the Data Catalog and to an S3 bucket to allow for further processing.