How do I troubleshoot common errors with API calls in Amazon ECS?

8 minute read
1

I want to troubleshoot common errors with API calls in Amazon Elastic Container Service (Amazon ECS).

Short description

The Amazon ECS APIs might fail with one of the following errors:

  • AccessDeniedException
  • ClientException
  • ClusterNotFoundException
  • InvalidParameterException
  • ServerException
  • ServiceNotActiveException
  • PlatformTaskDefinitionIncompatibilityException
  • PlatformUnknownException
  • ServiceNotFoundException
  • UnsupportedFeatureException

You might also experience API issues with the application that's running inside your Amazon ECS tasks

Resolution

Your API requests are recorded in AWS CloudTrail as events. When activity occurs in Amazon ECS, that activity is recorded in a CloudTrail event along with other AWS service events in Event history. You can view, search, and download recent events in your AWS account.

To view the CloudTrail event history and locate the API errors, do the following:

  1. Open the AWS CloudTrail console.
  2. In the navigation pane, choose Event history.
  3. Choose the gear icon.
  4. Under Select visible columns, select Error code.
  5. Choose Confirm.
  6. In the Event history page, for Lookup attributes, select Event name.
  7. For Enter an event name, enter the action that failed.
    Note: If you don't know the event name, then do the following:
    For Lookup attributes, select Event source.
    For Enter an event source, select ecs.amazonaws.com to filter all events related to your ECS service.
  8. From the list of results, choose the events with error codes of your choice to view the event details.

AccessDeniedException

This error is logged when the AWS Identity and Access Management (IAM) user or role making the API call doesn’t have the required permissions to perform the requested action.

The AccessDeniedException error looks similar to the following:

An error occurred (AccessDeniedException) when calling the CreateCluster operation: User: arn:aws:sts::123456789012:assumed-role/test-role/test-session is not authorized to perform: ecs:CreateCluster on resource: * because no identity-based policy allows the ecs:CreateCluster action

You can view the following details in the related CloudTrail event record:

  • User information:
"type": "AssumedRole",
"principalId": "AROAZEPPWYLQU45ZDJY6V:test-session",
"arn": "arn:aws:sts::123456789012:assumed-role/test-role/test-session"
  • Event name:
"eventName": "CreateCluster"
  • Error message:
"errorMessage": "User: arn:aws:sts::123456789012:assumed-role/test-role/test-session is not authorized to perform: ecs:CreateCluster on resource: * because no identity-based policy allows the ecs:CreateCluster action"

To test a policy that is not attached to a user, user group, or role, use the IAM policy simulator.

To resolve this error, do the following:

  1. Open the IAM console.
  2. In the navigation pane, choose Roles or Users depending on the user identity.
  3. Filter the role or user using the search filter.
  4. Choose the role or user.
  5. Choose the Permissions tab.
  6. Expand the permissions policy to view the permissions associated with the user.
  7. Be sure that the policy includes ecs:your-event-name in the Actions list and Allow for Effect. If the policy doesn't include these parameters, then update the policy to include these changes. Or, create a new policy allowing the mentioned action and attach the policy to the IAM role or user. For more information, Editing customer managed policies (console).

ClientException

This error is logged when the ECS client specifies an identifier or resource that isn't valid or doesn't exist. For example, if you try to start a task using the RunTask or StartTask API and refer an incorrect task definition, you get this error:

$ aws ecs run-task --cluster example-cluster --task-definition centos --region ap-southeast-2
An error occurred (ClientException) when calling the RunTask operation: TaskDefinition not found.
$ aws ecs start-task --cluster example-cluster --task-definition centos --container-instances 765936fadbdd46b5991a4bd70c2a43d4 --region ap-southeast-2
An error occurred (ClientException) when calling the StartTask operation: TaskDefinition not found.

To prevent this error, be sure that the resources referred in the command, your code, or API calls exist and are valid.

ClusterNotFoundException

This error is logged when the specified cluster isn't found.

Example:

$ aws ecs run-task --task-definition CentOS --cluster example-cluster --region ap-southeast-2
An error occurred (ClusterNotFoundException) when calling the StartTask operation: Cluster not found.

To avoid this error, be sure that the cluster name that you pass in the command, your code, or API calls is correct. You can run the following command to list the existing ECS clusters. With the list that's returned, you can verify that the cluster mentioned in the API call exists.

$ aws ecs list-clusters --region example-region
{
    "clusterArns": [
        "arn:aws:ecs:ap-southeast-2:123456789012:cluster/my-cluster",
        "arn:aws:ecs:ap-southeast-2:123456789012:cluster/my-private-cluster"
    ]
}

InvalidParameterException

This error is logged when the parameter passed in the command isn't valid. Suppose that you mentioned a version of the task definition that doesn't exist:

$ aws ecs run-task --task-definition CentOS:3 --cluster example-cluster --region ap-southeast-2

Then, the error looks similar to the following:

An error occurred (InvalidParameterException) when calling the RunTask operation: TaskDefinition not found.

To avoid this error, be sure that the parameters passed in the command are valid.

ServerException

This error is logged when there is a server error related to the API call. ServerException is usually caused due to HTTP error code 500. This exception occurs when there is an issue with the ECS service in the AWS Region. This error is usually temporary and subsequent attempts to run the API should be successful. However, if the issue persists for a long time, contact AWS Support.

ServiceNotActiveException

This error occurs when the ECS service that's being updated isn't active. Verify that the ECS service that's being updated is present in the ECS cluster and is in active state.

Run the following AWS Command Line Interface (AWS CLI) command to list all the services in the cluster:

$ aws ecs list-services --cluster example-cluster

In the output, verify whether the service that's being updated is displayed.

Note: If you receive errors when running AWS CLI commands, make sure that you’re using the most recent version of the AWS CLI.

Then, run the following command to verify that the service is in the active state.

$ aws ecs describe-services --services example-service-name --cluster example-cluster

The output might look similar to the following:

{
    "services": [{
        "serviceArn": "arn:aws:ecs:ap-southeast-2:111122223333:service/my-cluster/example-service",
        "serviceName": "example-service",
        "clusterArn": "arn:aws:ecs:ap-southeast-2:111122223333:cluster/example-cluster",
        "loadBalancers": [],
        "serviceRegistries": [],
        "status": "ACTIVE",
        ......
    }]
}

PlatformTaskDefinitionIncompatibilityException

This error occurs when a task is launched on a platform that doesn’t meet the capabilities required in the task definition. Suppose that you try to create a service with an Amazon EFS volume attached on platform version 1.3.0:

$ aws ecs create-service \
    --cluster example-cluster \
    --task-definition Test-fargate-EFS \
    --launch-type FARGATE \
    --service-name example-service \
    --desired-count 1 \
    --network-configuration="awsvpcConfiguration={subnets=["subnet-ed7d31b5","subnet-833ef1cb"],securityGroups=["sg-eeb28aa1"]}" \
    --platform-version 1.3.0

Then, you get the following error:

An error occurred (PlatformTaskDefinitionIncompatibilityException) when calling the CreateService operation: One or more of the requested capabilities are not supported.

To resolve this issue, be sure to use the platform version that supports the capability requirements in the task definition. For information about supported capabilities in the various platform versions, see AWS Fargate platform versions.

PlatformUnknownException

This error occurs if you specify an unknown or wrong platform version when you launch a task. Suppose that you provide an incorrect platform version 1.3 instead of version 1.3.0:

$ aws ecs create-service \
    --cluster example-cluster\
    --task-definition example-task \
    --launch-type FARGATE\
    --enable-execute-command \
    --service-name example-service\
    --desired-count 1 \
    --network-configuration="awsvpcConfiguration={subnets=["subnet-ed7d31b5","subnet-833ef1cb"],securityGroups=["sg-eeb28aa1"]}"\
    --platform-version 1.3

Then, you get the following error:

An error occurred (PlatformUnknownException) when calling the CreateService operation: The specified platform does not exist.

For more information, see Linux platform versions and Windows platform versions.

ServiceNotFoundException

This error occurs when the ECS service that's specified in your command or code doesn’t exist. Verify that the service name in your command or code is correct and the service is present in the cluster.

To view all the services in the cluster, run the following command:

$ aws ecs list-services --cluster example-cluster

UnsupportedFeatureException

This error occurs when an ECS feature isn't available in a specific Region. For example, the AWS Fargate feature might not be immediately available in a newly launched Region. If a Fargate task is launched in this Region, then you get the UnsupportedFeatureException error.

Application API issues

The following are some of the most commonly seen HTTP 5xx errors that you might get when you access the application hosted inside an ECS task:

500 - Internal Server Error: You get this error when the application encounters an unexpected condition. This error might occur due to application misconfiguration or an error with the application.

503 - Service Unavailable: You get this error under the following conditions:

  • The ECS task is experiencing heavy workload and unable to service the request.
  • The application running inside your task is down for maintenance.

To troubleshoot these errors, do the following:

Analyze the application logs for the ECS tasks in Amazon CloudWatch Logs. You can find information about the log group from the task definition. Each task is associated with an individual long stream that contains the application logs from the task.

To view the log group and log stream for your task, run the following command:

$ aws ecs describe-task-definition —task-definition example-taskdefinition

The output looks similar to the following:

...
                "logConfiguration": {
                    "logDriver": "awslogs",
                    "options": {
                        "awslogs-group": "/ecs/example-task",
                        "awslogs-region": "ap-southeast-2",
                        "awslogs-stream-prefix": "ecs"
                    }
                }
...

Related information

API failure reasons

AWS OFFICIAL
AWS OFFICIALUpdated 2 years ago