Lambda Alarms in a Serverless Architecture

Avatar
Geoff Affleck, Director of Engineering
4 min read

AWS Serverless architectures are amazingly powerful. Using an infrastructure as code tool like Terraform or Cloudformation, you can easily spin up highly scalable resources with little effort.

Obtaining good observability can be a challenge, though.

While there are paid tools out there that can be configured to ingest your metrics and manage observability for you, CloudWatch is a very powerful tool in its own right. However, configuring custom metric filters and alarms can be a chore as the size of your stack and number of environments grows.

My teammates and I at Produce8 are big fans of the AWS Serverless tools available. As our system grows, we need a scalable and developer friendly solution to provide real-time observability and alerting without adding another third-party app to our stack.

This sample shows you how to create a scalable solution to use both standard and custom CloudWatch metrics with alarms on all the lambdas in your stack to provide real-time observability on log events and lambda errors with a few lines of code.

Tech Stack: Typescript, CDK CLI, AWS

Tools Used: CDK, CloudWatch Log Groups, CloudWatch Alarms, Lambda

Setup

First, let's create our stack. For illustrative purposes, I'm just going to create two lambdas.

Now, I could define a custom alarm for each lambda as I build, but this approach will add a lot of bloat to the code, and it's easy to forget to add them as the codebase grows.

So, to ensure every Lambda in the stack (including lambdas defined in the future) will have the same set of default alarms, I'm going to use a CDK aspect to traverse the stack, get all the lambdas and add two alarms to each.

1export class DefaultAlarmsStack extends cdk.Stack {
2  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
3    super(scope, id, props);
4
5    new NodejsFunction(this, "helloWorld", {
6      entry: path.join(
7                __dirname,"../lambda/helloWorld.ts"), // accepts .js, .jsx, .ts and .tsx files
8                
9              });
10
11    new NodejsFunction(this, "helloWorld2", {
12      entry: path.join(
13                __dirname,"../lambda/helloWorld2.ts"), // accepts .js, .jsx, .ts and .tsx files
14                
15              });
16
17		//our alarm topic
18    const topic = new Topic(this,"alarmTopic")
19		
20	  //Add default Alarms to all Lambdas and alert on the SNS topic
21    Aspects.of(this).add(new DefaultAlarm(topic.topicArn));
22  
23  }
24
25
26}

Aspects use the visitor pattern to call an instance of the default alarm for each node in the stack. This will continue to add alarms to any new lambdas that are added in the future.

Adding alarms

Now for the magic. Let's create some alarms.

1export class DefaultAlarm implements IAspect {
2    
3    alarmSnsArn: string;
4    constructor( alarmSnsArn: string) {
5        this.alarmSnsArn = alarmSnsArn;
6    }
7    public visit(node: IConstruct): void {
8        if (node instanceof NodejsFunction) {
9            const topic = Topic.fromTopicArn(node, "alarm-sns-topic", this.alarmSnsArn);
10            this.addErrorLogAlarm(node, topic);
11            this.addLambdaErrorAlarm(node, topic);
12        }
13    }
14
15
16    private addErrorLogAlarm(node: NodejsFunction, topic: ITopic): void {
17        const func = node as NodejsFunction;
18        
19        const metricFilterId = `metric-filter-${node.node.id}`;
20        const metricName = `metric-${func.functionName}`;
21        const alarmName = `log-error-alarm-${func.functionName}`;
22        const alarmId = `default-log-error-alarm-${node.node.id}`;
23
24        const filter = func.logGroup.addMetricFilter(metricFilterId, {
25            filterPattern: {
26                // eslint-disable-next-line quotes
27                logPatternString: "{$._logLevel = error}",
28            },
29            metricName,
30            metricNamespace: `custom`,
31            metricValue: "1",
32        });
33
34        const alarm = new Alarm(node, alarmId, {
35            evaluationPeriods: 1,
36            alarmName,
37            threshold: 0,
38            comparisonOperator: ComparisonOperator.GREATER_THAN_THRESHOLD,
39            metric: filter.metric().with({
40                period: Duration.minutes(1),
41            }),
42            treatMissingData: TreatMissingData.MISSING,
43        });
44
45        alarm.addAlarmAction(new SnsAction(topic));
46    }
47
48    private addLambdaErrorAlarm(node: NodejsFunction, topic: ITopic): void {
49        const errorsMetric = node.metric(LAMBDA_DEFAULT_METRICS.ERRORS);
50        const alarmId = `lambda-errors-alarm-${node.node.id}`;
51        const alarmName = `lambda-errors-alarm-${node.functionName}`;
52
53        const alarm = new Alarm(node, alarmId, {
54            evaluationPeriods: 1,
55            alarmName,
56            threshold: 0,
57            comparisonOperator: ComparisonOperator.GREATER_THAN_THRESHOLD,
58            metric: errorsMetric.with({
59                period: Duration.minutes(1),
60            }),
61            treatMissingData: TreatMissingData.MISSING,
62        });
63
64        alarm.addAlarmAction(new SnsAction(topic));
65    }
66
67   
68}

I want to explain what's happening here. The function:

1public visit(node: IConstruct): void

will be called once with each construct in the stack, so we need to ensure we're dealing with a lambda, not an S3 bucket or IAM Policy etc.

1		 if (node instanceof NodejsFunction)  {
2            const topic = Topic.fromTopicArn(node, "alarm-sns-topic", this.alarmSnsArn);
3            this.addErrorLogAlarm(node, topic);
4            this.addLambdaErrorAlarm(node, topic);
5        }

Once we know we have a NodeJS lambda, we can proceed to create our alarms. The first function:

1this.addErrorLogAlarm(node, topic);

will add a custom log filter on the log group and look for a pattern of our choosing. Since we log in JSON format at Produce8, I can use this pattern to look for any errors logged.

1"{$._logLevel = error}"

Next, I create a custom alarm and publish any instances of errors in the logs to an SNS topic I've set up.

1const alarm = new Alarm(node, alarmId, {
2            evaluationPeriods: 1,
3            alarmName,
4            threshold: 0,
5            comparisonOperator: ComparisonOperator.GREATER_THAN_THRESHOLD,
6            metric: filter.metric().with({
7                period: Duration.minutes(1),
8            }),
9            treatMissingData: TreatMissingData.MISSING,
10        });
11
12        alarm.addAlarmAction(new SnsAction(topic));

This is helpful to catch runtime errors in my lambda code. But what if the lambda fails for some other reason, like a timeout?

In that case, my custom logging will never get called, so I need to ensure I get notifications on other kinds of lambda errors as well. Here, AWS helps us with their standard metrics. Looking at the lambda function, you can see all the metrics available.

I'm going to use the "Errors" metric for this example, but you can use any you want.

1private addLambdaErrorAlarm(node: NodejsFunction, topic: ITopic): void {
2        const errorsMetric = node.metric(LAMBDA_DEFAULT_METRICS.ERRORS); //"Errors"
3        const alarmId = `lambda-errors-alarm-${node.node.id}`;
4        const alarmName = `lambda-errors-alarm-${node.functionName}`;
5
6        const alarm = new Alarm(node, alarmId, {
7            evaluationPeriods: 1,
8            alarmName,
9            threshold: 0,
10            comparisonOperator: ComparisonOperator.GREATER_THAN_THRESHOLD,
11            metric: errorsMetric.with({
12                period: Duration.minutes(1),
13            }),
14            treatMissingData: TreatMissingData.MISSING,
15        });
16
17        alarm.addAlarmAction(new SnsAction(topic));
18    }

In this case, we already have a metric, so we just need to create a new alarm to use it and publish to the same SNS topic.

I've added a custom lambda to forward messages from the SNS topic to a Slack channel so the whole team can get near-real-time (~1min) notifications on lambda errors.

While you do pay a small monthly fee for CloudWatch alarms if you go beyond the free tier, it's a drop in the bucket compared with the bigger monitoring solutions. See pricing here .

And to see this code sample and others, head over here to Produce8’s public CDK samples repo.

In the spirit of openness and transparency, and in line with our values as a company, our team at Produce8 is dedicated to publishing and sharing articles like this on our internal development practices .

Share this article across the web
Never miss an update from us. Join our community of business leaders and professionals navigating work and life in the digital-first era.
Subscribe Now

Related Aritcles

Digital Work Analytics reporting

Produce8 Launches Essential Insights, a Breakthrough Digital Work Analytics Reporting Tool for Businesses and MSPs

3 min read

Digital Work Analytics reporting

Produce8 Launches Essential Insights: Revolutionizing Digital Work Analytics for Businesses and MSPs

3 min read

Unlock great workdays

Wether you are collaborating with your team or solo tackling your day we can help you recover the most valuable asset, time.