Logs, Metrics, and Alerts

Log Aggregation

TetraScience uses AWS CloudWatch as the centralized storage place for logs. Log data from all platform components, including the following sources are sent to CloudWatch:

ECS containers
AWS Lambda functions
Data Hub events
Platform audit logs

Infrastructure Metrics and Alarms

AWS provides out-of-the-box metrics for infrastructure components. These metrics are available in CloudWatch, and custom dashboards and widgets showing historical performance can be created out of them. CloudWatch alarms can also be defined and they will trigger whenever metrics cross a certain threshold.
Tetra Data Platform's CloudFormation stacks contain multiple custom alarms. Some of the alarms are configured to trigger automatic actions; for instance, an ECS service having a high CPU usage for more than 5 minutes will cause another container instance to be created and take up some of the load. Other alarms, like a service restarting unusually fast, will cause an alert email to be sent to the address configured for notifications during deployment. The copy of the email will also go to TetraScience support.
The alarms section of CloudWatch will show the triggered alarms. The box for "Hide Auto Scaling alarms" should be always be ticked. Some alarms will be shown in an INSUFFICIENT state. This is normal and no action has to be taken.

Alerts Sent to TetraScience

The platform definition in CloudFormation includes an SNS topic that is used to convey high-priority infrastructure-level alerts to the TetraScience team. The alert contains no sensitive information, just an indication of which component failed and how (such as an error code or a metric value). This information helps the TetraScience team to provide timely and effective support.

Updated almost 2 years ago