Manage an Amazon EMR Job

Automate Amazon EMR job using Step Functions. Create cluster, add steps, execute synchronously, and terminate cluster. Minimal code for data processing.

This workflow demonstrates automating an Amazon EMR job using AWS Step Functions. The workflow creates an Amazon EMR cluster, adds multiple steps, runs them, and then terminates the cluster. The Amazon EMR task is executed synchronously, and the state machine waits for its success or failure. After the two steps are completed on the EMR cluster, it is terminated, enabling users to process and analyze data with minimal code. The AWS Cloud Development Kit (CDK) code generates all the necessary resources for running this workflow, including Identity and Access Management (IAM) roles and policies, a Simple Storage Service (S3) bucket for storing EMR logs, and the State Machine for managing the EMR job.

< Back to all workflows

GitHub icon View this workflow on GitHub


Clone repo

git clone https://github.com/aws-samples/step-functions-workflows-collection/tree/main/manage-emr-job-cdk/cd step-functions-workflows-collection/manage-emr-job-cdk

Deploy

<code>cdk deploy</code>


Testing

See the GitHub repository for detailed testing instructions.

Cleanup

1. Delete the stack: cdk destroy.

Additional resources

Created by:

Aditi Agarwal

Aditi Agarwal

Aditi Agarwal is a Cloud Consultant at Amazon Web Services (AWS).

Follow on LinkedIn