Run a container on a schedule with ECS
I've got a Docker container that I want to run periodically to fetch data and store it in a database. As this is something that needs to run persistently, I'm using Terraform to manage the infrastructure. It took me a while to figure out all the required resources and permissions, so I thought I'd share my solution here.
To securely run a container on a schedule in ECS, I needed to:
- Initialise Terraform
- Configure any secrets that the container needs
- Create an ECS Fargate cluster, and a VPC for the container to run in
- Create an IAM role for the container to execute as
- Create an IAM role for EventBridge to trigger the ECS task
- Define an EventBridge schedule trigger
- Define a task
Initialise Terraform
Create providers.tf
containing the AWS profile and region you want to use:
hcl
provider "aws" {region = "us-east-2"profile = "default"}
Then run terraform init
.
Secrets
The container that I'm running needs access to some secret values as environment variables. I read the two most sensitive items from tfvars
, and hard code the role as the Terraform repo is private anyway. The test/area/role
secret is really just configuration rather than being a secret:
hcl
variable "AREA_USERNAME" {description = "Username for AREA access"type = stringsensitive = true}variable "AREA_PASSWORD" {description = "Password for AREA access"type = stringsensitive = true}locals {my_secrets = {"test/area/username" = var.AREA_USERNAME"test/area/password" = var.AREA_PASSWORD"test/area/role" = "some_role"}}resource "aws_secretsmanager_secret" "my_secrets" {for_each = local.my_secretsname = each.key}resource "aws_secretsmanager_secret_version" "my_secret_values" {for_each = local.my_secretssecret_id = aws_secretsmanager_secret.my_secrets[each.key].idsecret_string = each.value}
ECS Cluster and VPC
ECS allows you to scale to zero, but you still need a VPC to spin up containers in. My containers need internet access, so here's how I create an ECS cluster and VPC with internet access:
hcl
# Create an ECS clusterresource "aws_ecs_cluster" "my_cluster" {name = "demo-fargate-cluster"}# Create a VPC and network, allowing all egress trafficresource "aws_vpc" "main" {cidr_block = "10.0.0.0/16"enable_dns_hostnames = true}resource "aws_subnet" "public" {vpc_id = aws_vpc.main.idcidr_block = "10.0.1.0/24"availability_zone = "us-east-2a"map_public_ip_on_launch = true}# You could also use a NAT gateway instead. We use an internet gateway# due to speed / cost reasons in this exampleresource "aws_internet_gateway" "gw" {vpc_id = aws_vpc.main.id}resource "aws_route_table" "public" {vpc_id = aws_vpc.main.idroute {cidr_block = "0.0.0.0/0"gateway_id = aws_internet_gateway.gw.id}}resource "aws_route_table_association" "public_assoc" {subnet_id = aws_subnet.public.idroute_table_id = aws_route_table.public.id}resource "aws_security_group" "ecs_tasks" {name = "ecs-scheduled-tasks-sg"description = "Allow outbound traffic"vpc_id = aws_vpc.main.idegress {from_port = 0to_port = 0protocol = "-1"cidr_blocks = ["0.0.0.0/0"]}}
Execution IAM role
The container that runs does not have access to AWS Secrets Manager by default. To allow access, I create a new IAM role that the container will assume using sts:AssumeRole
before running. This new role has a policy attached that allows access to specific secrets in AWS Secrets Manager.
hcl
# Allow ECS to assume this roleresource "aws_iam_role" "ecs_task_execution" {name = "ecs-task-execution-role"assume_role_policy = jsonencode({Version = "2012-10-17",Statement = [{Effect = "Allow",Principal = {Service = "ecs-tasks.amazonaws.com"},Action = "sts:AssumeRole"}]})}# Create a policy that allows access to the secrets we defined earlierresource "aws_iam_policy" "ecs_execution_secrets_access" {name = "ecs-exec-secrets-access"policy = jsonencode({Version = "2012-10-17",Statement = [{Effect = "Allow",Action = ["secretsmanager:GetSecretValue","secretsmanager:DescribeSecret"],Resource = [aws_secretsmanager_secret.my_secrets["test/area/username"].arn,aws_secretsmanager_secret.my_secrets["test/area/password"].arn,aws_secretsmanager_secret.my_secrets["test/area/role"].arn,]}]})}# Attach the above policy to the IAM roleresource "aws_iam_role_policy_attachment" "ecs_exec_secrets_access_attach" {role = aws_iam_role.ecs_task_execution.namepolicy_arn = aws_iam_policy.ecs_execution_secrets_access.arn}# Attach the ECS Task execution policyresource "aws_iam_role_policy_attachment" "ecs_task_execution_attach" {role = aws_iam_role.ecs_task_execution.namepolicy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"}
Trigger IAM role
Amazon EventBridge also needs an IAM role in order to trigger the ECS task. Here's an IAM role that has permission to run the defined tasks explicitly.
It took me a long time to realise that not only did I need to provide a runTask
permission to the tasks, I also needed to allow runTask
to the ECS cluster that was being used.
hcl
# Allow EventBridge to assume this roleresource "aws_iam_role" "eventbridge_invoke_ecs" {name = "eventbridge-ecs-invoke-role"assume_role_policy = jsonencode({Version = "2012-10-17",Statement = [{Effect = "Allow",Principal = {Service = "events.amazonaws.com"},Action = "sts:AssumeRole"}]})}# Create a new IAM policy that allows us to run all of the defined tasks# We need to explicitly allow ecs:RunTask for the ecs:cluster tooresource "aws_iam_role_policy" "eventbridge_ecs_policy" {name = "invoke-ecs"role = aws_iam_role.eventbridge_invoke_ecs.idpolicy = jsonencode({Version = "2012-10-17",Statement = [{Effect = "Allow",Action = "ecs:RunTask",Resource = aws_ecs_task_definition.scheduled_my_command.arn},{Effect = "Allow",Action = "iam:PassRole",Resource = aws_iam_role.ecs_task_execution.arn},{Effect = "Allow",Action = "ecs:RunTask",Resource = "*",Condition = {"ArnEquals" : {"ecs:cluster" : aws_ecs_cluster.my_cluster.arn}}}]})}
EventBridge schedule trigger
We need to define an event rule that runs the container on a schedule. I also create a Cloudwatch log group to capture task logs.
hcl
# Run the task at 23:59 every nightresource "aws_cloudwatch_event_rule" "run_at_23_59" {name = "run-ecs-task-schedule"schedule_expression = "cron(59 23 * * ? *)"}# And create a Cloudwatch log group to send logs toresource "aws_cloudwatch_log_group" "my_scheduled_task" {name = "/ecs/my-scheduled-task-logs"retention_in_days = 1}
Task Definition
Finally, we need to define the task to run. You'll need to upload the docker image to ECR before defining a task. You'll also need to specify all of the secrets that the container needs access to.
hcl
# Define the container and command to run, plus CPU/Memory usageresource "aws_ecs_task_definition" "scheduled_my_command" {family = "scheduled-my-command"requires_compatibilities = ["FARGATE"]network_mode = "awsvpc"cpu = "256"memory = "512"execution_role_arn = aws_iam_role.ecs_task_execution.arncontainer_definitions = jsonencode([{name = "demo",image = "hello-world",essential = true,secrets = [{name = "AREA_USER",valueFrom = aws_secretsmanager_secret.my_secrets["test/area/username"].arn},{name = "AREA_PASSWORD",valueFrom = aws_secretsmanager_secret.my_secrets["test/area/password"].arn},{name = "AREA_ROLE",valueFrom = aws_secretsmanager_secret.my_secrets["test/area/role"].arn}],logConfiguration = {logDriver = "awslogs",options = {awslogs-group = "/ecs/my-scheduled-task-logs"awslogs-region = "us-east-2"awslogs-stream-prefix = "demo"}}}])}# Trigger this task using the scheduled eventresource "aws_cloudwatch_event_target" "ecs_my_command_target" {rule = aws_cloudwatch_event_rule.run_at_23_59.namerole_arn = aws_iam_role.eventbridge_invoke_ecs.arntarget_id = "ecs-task-my-command"arn = aws_ecs_cluster.my_cluster.arnecs_target {task_definition_arn = aws_ecs_task_definition.scheduled_my_command.arnlaunch_type = "FARGATE"network_configuration {subnets = [aws_subnet.public.id]security_groups = [aws_security_group.ecs_tasks.id]assign_public_ip = true}}# This is only needed for debugging. See the final section# dead_letter_config {# arn = aws_sqs_queue.failed_invocations.arn# }}
Help, it's not working!
Last, but not least, debugging! If your task isn't triggering as expected, you can configure a DeadLetterQueue (DLQ) to receieve the error messages from AWS. If you choose to do this, uncomment the dead_letter_config
section of the task definition above.
hcl
resource "aws_sqs_queue" "failed_invocations" {name = "eventbridge-ecs-dlq"}resource "aws_sqs_queue_policy" "allow_eventbridge" {queue_url = aws_sqs_queue.failed_invocations.idpolicy = jsonencode({Version = "2012-10-17",Statement = [{Sid = "AllowEventBridgeToSendMessages",Effect = "Allow",Principal = {Service = "events.amazonaws.com"},Action = "sqs:SendMessage",Resource = aws_sqs_queue.failed_invocations.arn,Condition = {ArnEquals = {"aws:SourceArn" = aws_cloudwatch_event_rule.run_at_23_59.arn}}}]})}
To read messages from the dead letter queue, use the aws
CLI tool:
bash
# Fetch the queue URLaws sqs get-queue-url --queue-name eventbridge-ecs-dlq# Read messages - change the URL for your IDaws sqs receive-message \--queue-url https://sqs.us-east-2.amazonaws.com/111111111111/eventbridge-ecs-dlq \--max-number-of-messages 10 \--wait-time-seconds 5 \--message-attribute-names All \--attribute-names All
Conclusion
The parts of this that gave me the most trouble were figuring out how to debug using dead_letter_queue
and that the IAM policy for runTask
needed access to the cluster too.
Hopefully this has helped you (or will help me again in the future) to deploy scheduled tasks on ECS.