Processing streaming data can be complex in traditional, server-based architectures, especially if you must react in real-time. Many organizations spend significant time and cost managing and scaling their streaming platforms. In order to react fast, they must provision for peak capacity, which adds complexity.
AWS Lambda is a serverless compute service that removes the undifferentiated heavy lifting when processing Kafka streams. You don't have to manage infrastructure, can reduce operational overhead, lower costs, and scale on-demand. This helps you focus more on building streaming applications. You can write Lambda functions in a number of programming languages, which provide flexibility when processing streaming data.
For an introduction to Lambda, see AWS Lambda Fundamentals.
There are three ways to use Lambda to process streams: Kafka Connect, Amazon EventBridge Pipes, and Lambda event source mappings (ESM). This learning guide focuses primarily on the Lambda ESM.
Kafka Connect is a framework to integrate Kafka with other systems using Connectors, which are named depending on the data movement direction. Source connectors read data from systems and store them in Kafka topics. Sink connectors deliver data from Kafka topics into other systems, such as Lambda. Confluent has a pre-built Kafka connector for Lambda. The Confluent Lambda sink connector pulls records from one or more Kafka topics, batches them, converts them to JSON, and invokes a Lambda function. You can invoke the Lambda function either synchronously or asynchronously.
Lambda can also integrate natively with your Kafka environments as a consumer to process stream data as soon as it's generated. Lambda and serverless architectures are well-suited for stream processing workloads that are event-driven and have burst or variable compute requirements.
To consume streaming data from Kafka, you configure an event source mapping (ESM) on your Lambda functions. This is a Lambda service managed resource, which is separate from your function. It continually polls records from the topics in the Kafka cluster. The ESM optionally filters them and batches those records into a payload. Then, it calls the Lambda Invoke API to deliver the payload to your Lambda function synchronously for processing. The ESM can scale up automatically to handle additional load. You can write your processing function in any language.
As Lambda managed the pollers, you don't need to manage a fleet of consumers across multiple teams. Each team creates their ESM and Lambda handles the polling.
The Lambda function's event payload contains an array of records. Each array item contains details of the topic and Kafka partition identifier, together with a timestamp and base64 encoded record:
{
"eventSource": "aws:kafka",
"eventSourceArn": "arn:aws:kafka:us-east-1:123456789012:cluster/vpc-2priv-2pub/751d2973-a626-431c-9d4e-d7975eb44dd7-2",
"records": {
"mytopic-0": [
{
"topic": "mytopic"
"partition": "0",
"offset": 15,
"timestamp": 1545084650987,
"timestampType": "CREATE_TIME",
"key":"abcDEFghiJKLmnoPQRstuVWXyz1234==",
"value": "SGVsbG8sIHRoaXMgaXMgYSB0ZXN0Lg==",
"headers":[
{
"headerKey":[
104,
101,
101
]
}
]
}
]
}
}
Amazon EventBridge Pipes helps you create point-to-point integrations between event producers and consumers. You can include optional steps for filtering, transformation, and enrichment.
You can move data from your on-premises or self-hosted Kafka topics to AWS services such as Amazon Kinesis Data Firehose or Amazon Simple Queue Service (SQS).
AWS Lambda's Event Source Mapping (ESM) and Amazon EventBridge Pipes use the same polling infrastructure to select and send events. EventBridge Pipes is ideal if you need a light processing ETL workflow to one of over 14 supported targets. You don't need to manage Lambda code to send records to a target, which is managed by Pipes. Kafka can be a Pipes source but not a target. ESM is ideal if you want to use Lambda with rich processing to send records to any target.
You can also use Lambda as a producer to write records to a Kafka topic. Creating a serverless Apache Kafka publisher using AWS Lambda is an example that provides an API to write to Kafka.