Select your cookie preferences

We use cookies and similar tools to enhance your experience, provide our services, deliver relevant advertising, and make improvements. Approved third parties also use these tools to help us deliver advertising and provide certain site features.

Using AWS Step Functions to query Amazon Athena S3 data

Created with SnapAWS Step FunctionsAmazon AthenaAWS GlueAmazon S3

Using Step Functions to query Athena S3 data.

This serverless pattern uses Amazon Athena allows to analyze data in Amazon S3 using standard SQL.
A Glue crawler creates a database table from the S3 data which is queried by Athena.
Step Functionsis leveraged to query as well as process each result in this data received from Athena with the help of **NextToken**.

Transform: AWS::Serverless-2016-10-31
AWSTemplateFormatVersion: "2010-09-09"
Description: Sample SAM Template for Athena Querying (uksb-1tthgi812) (tag:step-functions-athena-glue-sam)

Parameters:
  AthenaBucket:
    Type: String
    Description: Enter your unique bucket name to store Athena query results

  CrawlerBucket:
    Type: String
    Description: Enter your unique bucket name to fetch table from

  Database:
    Type: String
    Description: Enter a Database name

Resources:

  SFRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          -
            Effect: "Allow"
            Principal:
              Service:
                - "states.amazonaws.com"
            Action:
              - "sts:AssumeRole"
      Policies:
        - PolicyName: AthenaAccess
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Effect: Allow
                Action: 
                  - "athena:startQueryExecution"
                  - "athena:stopQueryExecution"
                  - "athena:getQueryExecution"
                  - "athena:getDataCatalog"
                  - "athena:getQueryResults"
                Resource: 
                  - !Sub "arn:aws:athena:${AWS::Region}:${AWS::AccountId}:workgroup/*"
                  - !Sub "arn:aws:athena:${AWS::Region}:${AWS::AccountId}:datacatalog/*"
        
        - PolicyName: S3Access
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Effect: Allow
                Action:
                  - "s3:GetBucketLocation"
                  - "s3:GetObject"
                  - "s3:ListBucket"
                  - "s3:ListBucketMultipartUploads"
                  - "s3:ListMultipartUploadParts"
                  - "s3:AbortMultipartUpload"
                  - "s3:CreateBucket"
                  - "s3:PutObject"
                Resource:
                  - !Sub "arn:aws:s3:::${AthenaBucket}/*"
                  - !Sub "arn:aws:s3:::${CrawlerS3Bucket}/*"
                  - !Sub "arn:aws:s3:::${AthenaBucket}"
                  - !Sub "arn:aws:s3:::${CrawlerS3Bucket}"

        - PolicyName: GlueAccess
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Effect: Allow
                Action:
                  - "glue:CreateDatabase"
                  - "glue:GetDatabase"
                  - "glue:GetDatabases"
                  - "glue:UpdateDatabase"
                  - "glue:DeleteDatabase"
                  - "glue:CreateTable"
                  - "glue:UpdateTable"
                  - "glue:GetTable"
                  - "glue:GetTables"
                  - "glue:DeleteTable"
                  - "glue:BatchDeleteTable"
                  - "glue:BatchCreatePartition"
                  - "glue:CreatePartition"
                  - "glue:UpdatePartition"
                  - "glue:GetPartition"
                  - "glue:GetPartitions"
                  - "glue:BatchGetPartition"
                  - "glue:DeletePartition"
                  - "glue:BatchDeletePartition"
                Resource:
                - !Sub "arn:aws:glue:${AWS::Region}:${AWS::AccountId}:catalog"
                - !Sub "arn:aws:glue:${AWS::Region}:${AWS::AccountId}:database/*"
                - !Sub "arn:aws:glue:${AWS::Region}:${AWS::AccountId}:table/*"
                - !Sub "arn:aws:glue:${AWS::Region}:${AWS::AccountId}:userDefinedFunction/*"           

  StateMachine:
    Type: AWS::Serverless::StateMachine
    Properties:
      Name: StateMachine
      DefinitionUri: athenaquery.asl.json
      DefinitionSubstitutions:
        AthenaBucketname: !Ref AthenaBucket
        CrawlerBucketname: !Ref CrawlerBucket
        DBname: !Ref MyDatabase
      Role: !GetAtt SFRole.Arn


  AthenaQueryS3Bucket:
    Type: 'AWS::S3::Bucket'
    Properties:
      BucketName: !Ref AthenaBucket
      AccessControl: "BucketOwnerFullControl"

  MyRole:
    Type: 'AWS::IAM::Role'
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          -
            Effect: "Allow"
            Principal:
              Service:
                - "glue.amazonaws.com"
            Action:
              - "sts:AssumeRole"
      ManagedPolicyArns:
        ['arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole']
      Policies:
        -
          PolicyName: "S3BucketAccessPolicy"
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              -
                Effect: "Allow"
                Action: 
                  - "s3:GetObject"
                  - "s3:PutObject"
                Resource: 
                  !Join 
                    - ''
                    - - !GetAtt CrawlerS3Bucket.Arn
                      - "*"
 
  MyDatabase:
    Type: 'AWS::Glue::Database'
    Properties:
      CatalogId: !Ref AWS::AccountId
      DatabaseInput:
        Name: !Ref Database
 
  CrawlerS3Bucket:
    Type: 'AWS::S3::Bucket'
    Properties:
      BucketName: !Ref CrawlerBucket
      AccessControl: "BucketOwnerFullControl"
 
  MyCrawler:
    Type: 'AWS::Glue::Crawler'
    Properties:
      Name: "testcrawler"
      Role: !GetAtt MyRole.Arn
      DatabaseName: !Ref MyDatabase
      Targets:
        S3Targets:
          - Path: 
              !Join
                - ''
                - - !Ref CrawlerS3Bucket
                  - "/"


Outputs:
  ActivityStateMachineArn:
    Description: "Activity State machine ARN"
    Value: !Ref StateMachine
  ActivityStateMachineRoleArn:
    Description: "IAM Role created for Activity State machine based on the specified SAM Policy Templates"
    Value: !GetAtt SFRole.Arn

< Back to all patterns


GitHub icon Visit the GitHub repo for this pattern.

Download

git clone https://github.com/aws-samples/serverless-patterns/ cd serverless-patterns/step-functions-athena-glue-sam

Deploy

sam deploy --guided


Testing

See the GitHub repo for detailed testing instructions.

Cleanup

Delete the stack: same delete.

Abhishek Agawane

Presented by Abhishek Agawane

I am a Cloud Support Engineer (Serverless) at AWS who loves baking pizzas.

Follow on LinkedIn