Orchestrate an EMR Serverless job

This workflow implements a job submission to Amazon EMR Serverless. The workflow checks for the job status and waits for job completeness before terminating/proceeding

This workflow enables the execution of big data jobs using Step Functions to coordinate an EMR Serverless application. EMR Serverless, when configured with autoStart and autoStop, remains inactive until a job is submitted, providing a true serverless experience for big data processing. The Step Functions initiates a job submission to the EMR Serverless application and periodically checks its status before proceeding to the next iteration. In a practical scenario, subsequent steps in the workflow might rely on the completion of the EMR job to perform operations such as manipulating the job's output files. The workflow leverages the native integration between Step Functions and AWS services, eliminating the need for custom code or AWS Lambda functions. The 'CallAwsService' functionality is utilized to minimize the maintenance of application code.

< Back to all workflows

GitHub icon View this workflow on GitHub


Clone repo

git clone https://github.com/aws-samples/step-functions-workflows-collection/tree/main/step-functions-emr-serverless-cdkcd step-functions-workflows-collection/step-functions-emr-serverless-cdk/

Deploy

1. Bootstrap CDK, if needed: cdk bootstrap aws://{your-aws-account-number}/{your-aws-region}2. Deploy the stack: cdk deploy


Testing

See the GitHub repo for detailed testing instructions.

Cleanup

1. Delete the stack: cdk destroy.

Created by:

Andrea Filippo La Scola

Andrea Filippo La Scola

Andrea Filippo is a Partner Solutions Architect at AWS based in Italy working on modern data architectures and solving problems with serverless technologies.

Follow on LinkedIn