AWS Alarms for Application Errors
Monitoring is key for any real-world application. You have to know what’s happening and be alerted in real time if something wrong is happening. AWS has CloudWatch for that, and gives you a lot of metrics automatically. But there are some that you have to define yourself. And then you need to define proper alarms.
Here I’ll focus on hour:
- High number of application errors
- High number of application warnings
- High number of 5xx errors on the load balancer
- High number of 4xx errors on the load balancer
First, the prerequisites:
- You need to be using CloudFormation to automate everything. You can create all of those things manually, but automation is a big plus
- If using CloudFormation, you’d preferably have a sub-stack for configuring alarms
- You need to be collecting your logs with CloudWatch logs
If you are not using CloudWatch logs, here’s a simple config file and script to enable them:
{
"agent": {
"metrics_collection_interval": 10,
"region": "eu-west-1",
"logfile": "/opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log"
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "{{logPath}}",
"log_group_name": "{{logGroupName}}",
"log_stream_name": "{instance_id}",
"timestamp_format": "%Y-%m-%d %H:%M:%S"
}
]
}
}
}
}
# install AWS CloudWatch monitor
mkdir cloud-watch-agent
cd cloud-watch-agent
wget https://s3.amazonaws.com/amazoncloudwatch-agent/linux/amd64/latest/AmazonCloudWatchAgent.zip
unzip AmazonCloudWatchAgent.zip
./install.sh
aws s3 cp s3://$BUCKET_NAME/cloudwatch-agent-config.json /var/config/cloudwatch-agent-config.json
sed -i -- 's|{{logPath}}|/var/log/application.log|g' /var/config/cloudwatch-agent-config.json
sed -i -- 's|{{logGroupName}}|app_node|g' /var/config/cloudwatch-agent-config.json
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/var/config/cloudwatch-agent-config.json -s
Now you have to define two things: Log metrics and alarms. The cloudformation code below creates both:
"HighAppErrorsNotification": {
"Type": "AWS::CloudWatch::Alarm",
"Properties": {
"AlarmActions": [
{
"Ref": "NotificationTopicId"
}
],
"InsufficientDataActions": [
{
"Ref": "NotificationTopicId"
}
],
"AlarmDescription": "Notify if there are too many application errors",
"ComparisonOperator": "GreaterThanOrEqualToThreshold",
"EvaluationPeriods": "1",
"MetricName": "ApplicationErrors",
"Namespace": "LogMetrics",
"Period": "900",
"Statistic": "Sum",
"Threshold": "5",
"TreatMissingData": "ignore"
}
},
"ErrorMetricFilter": {
"Type": "AWS::Logs::MetricFilter",
"Properties": {
"LogGroupName": "app_node",
"FilterPattern": "ERROR",
"MetricTransformations": [
{
"DefaultValue": 0,
"MetricValue": "1",
"MetricNamespace": "LogMetrics",
"MetricName": "ApplicationErrors"
}
]
}
},
If you need to do that manually, go to the CloudWatch logs homepage, select the log group (app_node) and use the button “Create metric filter” ontop. It lets you specify the pattern to look for (“ERROR” in this case). When you have that ready, you can create an Alarm based on it, through the Alarms -> Create alarm. Lookup the metric by name and select it to trigger the alarm (in the example above, it gets triggered if there are more than 5 errors within 900 seconds)
You can then create an identical alarm for warnings (pattern to look for: “WARN”). The threshold there might be higher, e.g. 10 or 20. But that depends on your application logging patterns.
Then there’s the error 5xx load balancer alarms. In CloudFormation it would look like this:
"TooMany5xxErrorsWebAppAlarmNotification": {
"Type": "AWS::CloudWatch::Alarm",
"Properties": {
"AlarmActions": [
{
"Ref": "NotificationTopicId"
}
],
"InsufficientDataActions": [
{
"Ref": "NotificationTopicId"
}
],
"AlarmDescription": "Notify if there are too many 5xx errors",
"ComparisonOperator": "GreaterThanOrEqualToThreshold",
"Dimensions": [
{
"Name": "LoadBalancer",
"Value": {
"Ref": "WebAppALBId"
}
}
],
"TreatMissingData": "notBreaching",
"EvaluationPeriods": "1",
"MetricName": "HTTPCode_Target_5XX_Count",
"Namespace": "AWS/ApplicationELB",
"Period": "60",
"Statistic": "Sum",
"Threshold": "2"
}
}
You can again create that manually – look for the HTTPCode_Target_5XX_Count metric in the metric selection screen for the alarm. You have several options there, the most straightforward is to select the per AppELB metric. And again, the same approach can be used for 4xx errors (HTTPCode_Target_5XX_Count).
Getting this running with CloudFormation (and even manually) is not as straighforward as it seems. The right combination of metric names, namespaces and values is not obvious and the relevant documentation is not the first thing that pops up. So I decided to share something that works, as it may take some time experimenting before getting it to that state.
But even outside of CloudFormation or AWS context, monitoring and alerting in case of a high number of application errors, warnings and HTTP errors is a must. And automating the creation of those alarms is the recommended approach.
Monitoring is key for any real-world application. You have to know what’s happening and be alerted in real time if something wrong is happening. AWS has CloudWatch for that, and gives you a lot of metrics automatically. But there are some that you have to define yourself. And then you need to define proper alarms.
Here I’ll focus on hour:
- High number of application errors
- High number of application warnings
- High number of 5xx errors on the load balancer
- High number of 4xx errors on the load balancer
First, the prerequisites:
- You need to be using CloudFormation to automate everything. You can create all of those things manually, but automation is a big plus
- If using CloudFormation, you’d preferably have a sub-stack for configuring alarms
- You need to be collecting your logs with CloudWatch logs
If you are not using CloudWatch logs, here’s a simple config file and script to enable them:
{ "agent": { "metrics_collection_interval": 10, "region": "eu-west-1", "logfile": "/opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log" }, "logs": { "logs_collected": { "files": { "collect_list": [ { "file_path": "{{logPath}}", "log_group_name": "{{logGroupName}}", "log_stream_name": "{instance_id}", "timestamp_format": "%Y-%m-%d %H:%M:%S" } ] } } } }
# install AWS CloudWatch monitor mkdir cloud-watch-agent cd cloud-watch-agent wget https://s3.amazonaws.com/amazoncloudwatch-agent/linux/amd64/latest/AmazonCloudWatchAgent.zip unzip AmazonCloudWatchAgent.zip ./install.sh aws s3 cp s3://$BUCKET_NAME/cloudwatch-agent-config.json /var/config/cloudwatch-agent-config.json sed -i -- 's|{{logPath}}|/var/log/application.log|g' /var/config/cloudwatch-agent-config.json sed -i -- 's|{{logGroupName}}|app_node|g' /var/config/cloudwatch-agent-config.json sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/var/config/cloudwatch-agent-config.json -s
Now you have to define two things: Log metrics and alarms. The cloudformation code below creates both:
"HighAppErrorsNotification": { "Type": "AWS::CloudWatch::Alarm", "Properties": { "AlarmActions": [ { "Ref": "NotificationTopicId" } ], "InsufficientDataActions": [ { "Ref": "NotificationTopicId" } ], "AlarmDescription": "Notify if there are too many application errors", "ComparisonOperator": "GreaterThanOrEqualToThreshold", "EvaluationPeriods": "1", "MetricName": "ApplicationErrors", "Namespace": "LogMetrics", "Period": "900", "Statistic": "Sum", "Threshold": "5", "TreatMissingData": "ignore" } }, "ErrorMetricFilter": { "Type": "AWS::Logs::MetricFilter", "Properties": { "LogGroupName": "app_node", "FilterPattern": "ERROR", "MetricTransformations": [ { "DefaultValue": 0, "MetricValue": "1", "MetricNamespace": "LogMetrics", "MetricName": "ApplicationErrors" } ] } },
If you need to do that manually, go to the CloudWatch logs homepage, select the log group (app_node) and use the button “Create metric filter” ontop. It lets you specify the pattern to look for (“ERROR” in this case). When you have that ready, you can create an Alarm based on it, through the Alarms -> Create alarm. Lookup the metric by name and select it to trigger the alarm (in the example above, it gets triggered if there are more than 5 errors within 900 seconds)
You can then create an identical alarm for warnings (pattern to look for: “WARN”). The threshold there might be higher, e.g. 10 or 20. But that depends on your application logging patterns.
Then there’s the error 5xx load balancer alarms. In CloudFormation it would look like this:
"TooMany5xxErrorsWebAppAlarmNotification": { "Type": "AWS::CloudWatch::Alarm", "Properties": { "AlarmActions": [ { "Ref": "NotificationTopicId" } ], "InsufficientDataActions": [ { "Ref": "NotificationTopicId" } ], "AlarmDescription": "Notify if there are too many 5xx errors", "ComparisonOperator": "GreaterThanOrEqualToThreshold", "Dimensions": [ { "Name": "LoadBalancer", "Value": { "Ref": "WebAppALBId" } } ], "TreatMissingData": "notBreaching", "EvaluationPeriods": "1", "MetricName": "HTTPCode_Target_5XX_Count", "Namespace": "AWS/ApplicationELB", "Period": "60", "Statistic": "Sum", "Threshold": "2" } }
You can again create that manually – look for the HTTPCode_Target_5XX_Count metric in the metric selection screen for the alarm. You have several options there, the most straightforward is to select the per AppELB metric. And again, the same approach can be used for 4xx errors (HTTPCode_Target_5XX_Count).
Getting this running with CloudFormation (and even manually) is not as straighforward as it seems. The right combination of metric names, namespaces and values is not obvious and the relevant documentation is not the first thing that pops up. So I decided to share something that works, as it may take some time experimenting before getting it to that state.
But even outside of CloudFormation or AWS context, monitoring and alerting in case of a high number of application errors, warnings and HTTP errors is a must. And automating the creation of those alarms is the recommended approach.
I agree! It is really important to monitor your applications once they are launched. And these are really amazing tips that you have shared. Thank you so much!