Query large datasets

Ingest large data set in S3, partition with Glue Crawlers, and perform Athena queries. Includes Step Functions, S3, Glue, and SNS.

In this project, the Step Functions state machine invokes an AWS Glue crawler that partitions a large dataset in Amazon S3. Once the AWS Glue crawler returns a success message, the workflow executes Athena queries against that partition. Once query execution is successfully complete, an Amazon SNS notification is sent to an Amazon SNS topic.

< Back to all workflows

GitHub icon View this workflow on GitHub


Clone repo

git clone https://github.com/aws-samples/step-functions-workflows-collection/tree/main/sfn-query-large-datasets-tf/cd step-functions-workflows-collection/sfn-query-large-datasets-tf

Deploy

terraform initterraform apply


Testing

See the GitHub repo for detailed testing instructions.

Cleanup

1. Delete the stack: terraform destroy.

Additional resources

Created by:

Itziar Olivera Goicolea

Itziar is a Technical Account Manager at AWS from Iberia (EMEA).

Follow on LinkedIn