[{"data":1,"prerenderedAt":69},["ShallowReactive",2],{"workflow-distributed-data-stream-aggregator":3},{"id":4,"title":5,"cleanup":6,"contributors":13,"deploy":15,"description":18,"diagram":19,"extension":20,"framework":21,"gitHub":22,"introBox":30,"level":37,"meta":38,"resources":39,"s3URL":51,"services":52,"simplicity":58,"stem":59,"testing":60,"type":66,"usecase":67,"videoId":26,"__hash__":68},"workflows\u002Fworkflows\u002Fdistributed-data-stream-aggregator.json","Distributed Data Stream Aggregator",{"headline":7,"text":8},"Cleanup",[9,10,11,12],"1. Delete the state machines using AWS CLI: \u003Ccode>aws stepfunctions delete-state-machine\u003C\u002Fcode>","2. Delete DynamoDB tables: \u003Ccode>aws dynamodb delete-table\u003C\u002Fcode>","3. Delete S3 buckets: \u003Ccode>aws s3 rb --force\u003C\u002Fcode>","4. Delete Glue job and EventBridge connection",[14],"content\u002Fcontributors\u002Faparna-saha.json",{"text":16},[17],"Follow the step-by-step deployment instructions in the README.md to create DynamoDB tables, S3 buckets, Glue job, EventBridge connections, and deploy the state machines.","Aggregate data from multiple third-party locations using distributed processing with 3-tier architecture","\u002Fassets\u002Fimages\u002Fworkflows\u002Fdistributed-data-stream-aggregator.png","json","AWS CLI",{"template":23,"payloads":28},{"repoURL":24,"templateDir":25,"templateFile":26,"ASL":27},"https:\u002F\u002Fgithub.com\u002Faws-samples\u002Fstep-functions-workflows-collection\u002Ftree\u002Fmain\u002Fdistributed-data-stream-aggregator\u002F","distributed-data-stream-aggregator","","statemachine\u002Fstatemachine.asl.json",[29],{"headline":26,"payloadURL":26},{"headline":31,"text":32},"How it works",[33,34,35,36],"This workflow demonstrates large-scale data aggregation from multiple third-party locations using AWS Step Functions' distributed processing capabilities with a 3-tier architecture.","The main workflow orchestrates the entire process by querying DynamoDB for client locations, then uses distributed map to process multiple locations in parallel. Each location is processed by a standard execution child workflow that handles data extraction and pagination.","A second express execution child workflow performs the actual API calls to third-party endpoints with query parameters and pagination support. Data is temporarily stored in S3 as JSON files organized by task ID.","Finally, an AWS Glue job consolidates all partial files into a single output file uploaded to the destination S3 bucket, with status updates tracked in DynamoDB.","200",{},{"headline":40,"bullets":41},"Additional resources",[42,45,48],{"text":43,"link":44},"The AWS Step Functions Workshop","https:\u002F\u002Fcatalog.workshops.aws\u002Fstepfunctions\u002Fen-US",{"text":46,"link":47},"Distributed Map state documentation","https:\u002F\u002Fdocs.aws.amazon.com\u002Fstep-functions\u002Flatest\u002Fdg\u002Famazon-states-language-map-state.html",{"text":49,"link":50},"JSONata expressions in Step Functions","https:\u002F\u002Fdocs.aws.amazon.com\u002Fstep-functions\u002Flatest\u002Fdg\u002Famazon-states-language-jsonata.html",null,[53,54,55,56,57],"sfn","dynamodb","s3","glue","eventbridge","3 - Application","workflows\u002Fdistributed-data-stream-aggregator",{"headline":61,"text":62},"Testing",[63,64,65],"1. Populate the locations DynamoDB table with test data containing task_id and location information.","2. Execute the state machine using the AWS CLI with task_id and task_sort_key as input.","3. Monitor execution progress in the Step Functions console and verify data consolidation in S3.","Standard","Data Processing","blKpe9dTUgDhiF6a9FypEiD_Wonx1o1RsIIxT0vP_R8",1778846888983]