Amazon ECS Metrics

ECS is short for "Elastic Container Service," and Amazon Elastic Container Service (Amazon ECS) is a fully managed container orchestration service that makes running your containerized environment secure, reliable and available. It supports serverless options like AWS Fargate and is integrated with a number of Amazon services like Amazon SageMaker, AWS Batch, Amazon Lex, and AWS App Mesh.

In a larger sense, an Amazon ECS service allows you to run and maintain a specific number of instances of a task at the same time in an Amazon ECS cluster. This also allows you to run your service behind a load balancer if necessary, which itself is an important part of the development process.

Why use a Telegraf plugin for Amazon ECS?

In order to maintain a reliable, available and performant instance of Amazon ECS and your other AWS solutions, you need to collect metrics and events from all components of your AWS solution. This will allow you to easily pinpoint the area that may be causing a failure.

Essentially, Amazon ECS is all about providing you a much-needed context so that you don't just know what is going on, but that you also know why. If you know not just that a problem has occurred but what conditions led to the failure, you know what you need to do to fix it. More than that, you also need to know what you must do to stop it from happening again.

The Amazon ECS Telegraf Input Plugin helps you easily pull metrics that let you know how well your Amazon ECS is performing. It collects metrics about the cluster, the tasks, memory and cpu consumption and more. Pair this with one of the many Telegraf plugins to monitor the application in the containers, and you will gain full visibility into your stack.

How to monitor Amazon ECS using the Telegraf plugin

The Amazon ECS Telegraf plugin is Amazon ECS and Amazon Fargate compatible and uses the Amazon ECS metadata and stats v2 or [v3][task-metadata-endpoint-v3] API endpoints to gather metrics on the running containers in a Task. Please note that the telegraf container must be run in the same Task as the workload it is inspecting. This is similar to the Docker input plugin, with some ECS specific modifications for AWS metadata and stats formats.

Key Amazon ECS metrics to use for monitoring

As always, the Amazon ECS metrics that you choose to monitor will ultimately vary depending on where you are in the development process, what problems or other performance-related issues you’re concerned about, and even what type of application you’re developing in the first place. Having said that, some of the important Amazon ECS metrics that you should proactively monitor include:

ECS task metrics
- Tags: cluster, task_arn, family, revision, id, name
- Fields: revision (string), desired_status (string), known_status (string), limit_cpu (float), limit_mem (float)
ECS container metrics
- Tags: cluster, task_arn, family, revision, id, name
- Fields: container_id, active_anon, active_file, cache, hierarchical_memory_limit, inactive_anon, Inactive_file, mapped_file, pgfault
- Pgmajfault, pgpgin, pgpgout, rss, rss_huge
- Total_active_anon, total_active_file, total_cache, total_inactive_anon, total_inactive_file, Total_mapped_file, Total_pgfault, Total_pgmajfault, total_pgpgin, total_pgpgout, Total_rss, Total_rss_huge, total_unevictable, Total_writeback, Unevictable, writeback, Fail_count, limit, max_usage, usage, usage_percent
ECS container cpu metrics
- Tags: cluster, task_arn, family, revision, id, name, usage_total, usage_in_usermode, usage_in_kernelmode
- Usage_system, throttling_periods, throttling_throttled_periods, throttling_throttled_time, usage_percent, usage_total
- Fields: container_id
ECS container net metrics
- Tags: cluster, task_arn, family, revision, id, name
- Fields: container_id, rx_packets, rx_dropped, rx_bytes, rx_errors, tx_packets, tx_dropped, tx_bytes, tx_errors
ECS container blkio metrics
- Tags: cluster, task_arn, family, revision, id, name
- Fields: container_id, io_service_bytes_recursive_async, io_service_bytes_recursive_read, io_service_bytes_recursive_sync, io_service_bytes_recursive_total, io_service_bytes_recursive_write, io_serviced_recursive_async, io_serviced_recursive_read, io_serviced_recursive_sync, io_serviced_recursive_total, io_serviced_recursive_write
ECS container meta metrics
- Tags: cluster, task_arn, family, revision, id, name
- Fields: container_id, docker_name, image, Image_id, desired_status, known_status, Limit_cpu, limit_mem, created_at, started_at, type

For more information, please check out the documentation.

Project URL Documentation

Related resources

InfluxDB and AWS - Scale your cloud infrastructure and time series analytics

AWS Marketplace Seller
AWS Data & Analytics Competency

Amazon ECS Metrics

Why use a Telegraf plugin for Amazon ECS?

How to monitor Amazon ECS using the Telegraf plugin

Key Amazon ECS metrics to use for monitoring

Related resources

Developer Education

InfluxDB and AWS - Scale your cloud infrastructure and time series analytics

Learn more about InfluxDB

Performance Benchmarking: InfluxDB 3.0 vs. InfluxDB Open Source

InfluxDB for Industrial IoT:  
A Live Demonstration

How Time Series Databases and Data Lakes Work Together

Data Warehousing

Network Monitoring

Time Series Data Analysis: Definitions and Best Techniques in 2024

Amazon ECS Metrics

Why use a Telegraf plugin for Amazon ECS?

How to monitor Amazon ECS using the Telegraf plugin

Key Amazon ECS metrics to use for monitoring

Related resources

AWS & InfluxData partnership

Customer Case Study: MuleSoft

AWS CloudWatch InfluxDB Template

Developer Education

InfluxDB and AWS - Scale your cloud infrastructure and time series analytics

Learn more about InfluxDB

Performance Benchmarking: InfluxDB 3.0 vs. InfluxDB Open Source

InfluxDB for Industrial IoT: A Live Demonstration

How Time Series Databases and Data Lakes Work Together

Data Warehousing

Network Monitoring

Time Series Data Analysis: Definitions and Best Techniques in 2024

InfluxDB for Industrial IoT:  
A Live Demonstration