How do I resolve 504 HTTP errors in Amazon EKS?

3 minute read
0

I get HTTP 504 (Gateway timeout) errors when I connect to a Kubernetes Service that runs in my Amazon Elastic Kubernetes Service (Amazon EKS) cluster.

Short description

You get HTTP 504 errors when you connect to a Kubernetes Service pod that's located in an Amazon EKS cluster configured for a load balancer.

To resolve HTTP 503 errors, see How do I resolve HTTP 503 (Service unavailable) errors when I access a Kubernetes Service in an Amazon EKS cluster?

To resolve HTTP 504 errors, complete the following troubleshooting steps.

Resolution

Verify that your load balancer's idle timeout is set correctly

The load balancer established a connection to the target, but the target didn't respond before the idle timeout period elapsed. By default, the idle timeout for the Classic Load Balancer and Application Load Balancer is 60 seconds.

1.    Review the Amazon CloudWatch metrics for your Classic Load Balancer or Application Load Balancer.

Note: At least one request has timed out when:

  • The latency data points are equal to your currently configured load balancer timeout value.
  • There are data points in the HTTPCode_ELB_5XX metric.

2.    Modify the idle timeout for your load balancer so that the HTTP request can complete within the idle timeout period. Or configure your application to respond quicker.

To modify the idle timeout for your Classic Load Balancer, update the service definition to include the service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout annotation.

To modify the idle timeout for your Application Load Balancer, update the Ingress definition to include the alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds annotation.

Verify that your backend instances have no backend connection errors

If a backend instance closes a TCP connection before it's reached the idle timeout value, then the load balancer fails to fulfill the request.

1.    Review the CloudWatch BackendConnectionErrors metrics for your Classic Load Balancer and the target group's TargetConnectionErrorCount for your Application Load Balancer.

2.    Activate keep-alive settings on your backend worker node or pods, and set the keep-alive timeout to a value greater than the load balancer's idle timeout.

To see if the keep-alive timeout is less than the idle timeout, verify the keep-alive value in your pods or worker node. See the following example for pods and nodes.

For pods:

$ kubectl exec your-pod-name -- sysctl net.ipv4.tcp_keepalive_time net.ipv4.tcp_keepalive_intvl net.ipv4.tcp_keepalive_probes

For nodes:

$ sysctl net.ipv4.tcp_keepalive_time net.ipv4.tcp_keepalive_intvl net.ipv4.tcp_keepalive_probes

Output:

net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9

Verify that your backend targets can receive traffic from the load balancer over the ephemeral port range

The network access control list (ACL) for the subnet doesn't allow traffic from the targets to the load balancer nodes on the ephemeral ports (1024-65535).

You must configure security groups and network ACLs to allow data to move between the load balancer and the backend targets. For example, depending on the load balancer type, these targets can be IP addresses or instances.

You must configure the security groups for ephemeral port access. To do so, connect the security group egress rule of your nodes and pods to the security group of your load balancer. For more information, see Security groups for your Amazon Virtual Private Cloud (Amazon VPC) and Add and delete rules.


Related information

I receive HTTP 5xx errors when connecting to web servers running on EC2 instances configured to use Classic Load Balancing. How do I troubleshoot these errors?

HTTP 504: Gateway timeout

Monitor your Classic Load Balancer

Monitor your Application Load Balancers

Troubleshoot a Classic Load Balancer: HTTP errors

AWS OFFICIAL
AWS OFFICIALUpdated a year ago