Query large datasets

Ingest large datasets in S3, partition with Glue Crawlers, and perform Athena queries. Includes Step Functions, S3, Glue, and SNS.

In this project, the Step Functions state machine invokes an AWS Glue crawler that partitions a large dataset in Amazon S3. Once the AWS Glue crawler returns a success message, the workflow executes Athena queries against that partition. Once query execution is successfully complete, an Amazon SNS notification is sent to an Amazon SNS topic.

< Back to all workflows

GitHub icon View this workflow on GitHub


Clone repo

git clone https://github.com/aws-samples/step-functions-workflows-collection/tree/main/sfn-query-large-datasets-cdk/cd step-functions-workflows-collection/sfn-query-large-datasets-sam

Deploy

cdk deploy


Testing

See the GitHub repo for detailed testing instructions.

Cleanup

Delete the stack: cdk destroy.

Additional resources

Created by:

Pajtim Matoshi

Pajtim Matoshi

Pajtim is a Solutions Architect @ Amazon Web Services based in Zürich, Switzerland.

Follow on LinkedIn