Distributed Map reduce weather analysis

Process all 37+ GB of NOAA Global Surface Summary of Day.

This implementation uses a Lambda map function (using a Distributed Map state from Step Functions) and a Lambda reducer function. The reducer function performs a final aggregation and writes the results to DynamoDB.
The reducer function is necessary because two child workflows in the Distributed Map run may process and find a high temperature for the same day. For example, child worflow 1 may find that Seattle, Washington, USA had the highest temperature on 2022-07 (July, 2022) while child workflow 2 finds that Jahra, Kuwait had the highest temperature on 2022-07. The reducer function will take a final pass through the outputs from all of the child workflows to find the correct highs.

Launch Stack
< Back to all workflows

GitHub icon View this workflow on GitHub


Clone repo

git clone https://github.com/aws-samples/aws-stepfunctions-examples/tree/main/sam/demo-distributed-map-data-processing/cd step-functions-workflows-collection/demo-distributed-map-data-processing/

Deploy

1. sam build2. sam deploy --guided


Testing

See the GitHub repo for detailed testing instructions.

Cleanup

1. Delete the stack: sam delete.

Created by:

Benjamin Smith

Benjamin Smith

Ben is a senior developer advocate for Serverless Applications at Amazon Web Services based in London, UK. Prior to joining AWS Ben worked in a number of different technical roles specializing in workflow Automation and web development.

Follow on LinkedIn