How do I troubleshoot Application Load Balancer HTTP 502 errors?

6 minute read
1

I encounter HTTP 502 errors with my Application Load Balancer.

Short description

There are several possible causes for HTTP 502: bad gateway errors. The source for the error can be either from your target or your Application Load Balancer. To identify the source of the error, use Amazon CloudWatch metrics and access logs.

Before you troubleshoot the error from your Application Load Balancer, make sure that you turn on access logging. To understand what each field means in the access log, see Access log entries.

If the target is an AWS Lambda function, then see Troubleshoot HTTP 502 errors when the target is a Lambda function in the Resolution section.

Resolution

Find the source of the HTTP 502 errors

Using CloudWatch metrics

If data points appear under the HTTPCode_ELB_502_Count metric, then your load balancer is the source of the HTTP 502 errors. If they appear under the HTTPCode_Target_5XX_Count metric, then your target is the source of the errors.

Using access logs

If the elb_status_code is "502" and the target_status_code is "-", then your load balancer is the source of the HTTP 502 errors. If the elb_status_code is "502" and the target_status_code is "502", then your target is the source of the errors.

Troubleshoot HTTP 502 errors

Note: Filter the access logs by elb_status_code = "502" and target_status_code to help you determine the cause. Then, complete the relevant steps for your use case.

The load balancer received a TCP RST from the target when attempting to establish a connection

You might receive a TCP RST from the target when establishing a connection. This message appears when the load balancer can't establish a TCP 3-way handshake with the target. As a result, the load balancer can't forward the user request to the target.

Verify that the request_processing_time, target_processing_time, and response_processing_time, fields in the access logs are each set to value -1. This value means that the load balancer can't dispatch the request to the target because it needs a successful connection.

The following is an example of an access log entry:

http 2022-04-15T16:52:50.757968Z app/my-loadbalancer/50dc6c495c0c9188 192.168.131.39:2817 10.0.0.1:80 -1 -1 -1 502 - 86 155 "GET http://example.com:80/ HTTP/1.1" "curl/7.51.0" - - arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-targets/73e2d6bc24d8a067" Root=1-58337262-36d228ad5d99923122bbe354"

Note: In this access log entry, the request_processing_time, target_processing_time and response_processing_time are each set to -1.

The load balancer received an unexpected response from the target, such as "ICMP Destination unreachable (Host unreachable)", when attempting to establish a connection

  • Verify that the request_processing_time, target_processing_time and response_processing_time fields in the access logs are all set to value -1.
  • Verify that traffic is allowed from the load balancer subnets to the targets on the target port.

The target closed the connection with a TCP RST or a TCP FIN while the load balancer had an outstanding request to the target

The load balancer receives a request and forwards it to the target. The target receives the request and starts to process it, but closes the connection to the load balancer too early. This usually occurs when the duration of the keep-alive timeout for the target is shorter than the idle timeout value of the load balancer. Make sure that the duration of the keep-alive timeout is greater than the idle timeout value.

Check the values for the request_processing_time, target_processing_time and response_processing_time fields.

See the following example access log entry:

http 2022-04-15T16:52:50.757968Z app/my-loadbalancer/50dc6c495c0c9188 192.168.131.39:2817 10.0.0.1:80 0.001 4.205 -1 502 - 94 326 "GET http://example.com:80 HTTP/1.1" "curl/7.51.0" - - arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-targets/73e2d6bc24d8a067 "Root=1-58337262-36d228ad5d99923122bbe354"

Note: In this access log entry, the request_processing_time is 0.001, the target_processing_time is 4.205, and the response_processing_time is -1.

The target response is malformed or contains HTTP headers that aren't valid

Perform a packet capture on the target for the timeframe of the issue to understand the target response.

The load balancer encountered an SSL handshake error or SSL handshake timeout (10 seconds) when connecting to a target

The TCP connection from the load balancer to the target's HTTPS listener is successful, but the subsequent SSL handshake times out. As a result, the load balancer can't forward the request to the target.

Verify that the target group uses the HTTPS protocol. If it doesn't use the HTTPS protocol, then the SSL handshake timeout isn't the cause of the issue. If the target group is using the HTTPS protocol, then check the following points:

  • Verify that the request_processing_time, target_processing_time and response_processing_time fields in the access logs are all set to value -1.
  • Verify that there are data points for the TargetTLSNegotiationErrorCount metric.
  • Perform a packet capture on the target for the timeframe of the issue to validate that it's related to an SSL handshake. If it is, then complete the steps in Perform a packet capture section.
  • Verify that the ciphers or protocols match.

The deregistration delay period elapsed for a request that's handled by a target that was deregistered

In your CloudTrail events, check for an API call with the DeregisterTargets action during the timeframe of the issue. An API call with DeregisterTargets that happens during the timeframe of the issue causes an error. This error occurs when the target was deregistered too early. To resolve the issue, increase the deregistration delay period so that lengthy operations can complete without failing.

Troubleshoot the HTTP 502 errors when the target is a Lambda function

Note: For requests to a Lambda function that fail, the load balancer stores Lambda-specific error reason codes in the error_reason field of the access logs.

The target is a Lambda function, and the response body exceeds 1 MB

  • Verify that there's a data point for the LambdaUserError metric.
  • Verify that the error_reason field in the load balancer access log is set to LambdaResponseTooLarge.

The target is a Lambda function that didn't respond before its configured timeout was reached

  • Verify the Lambda function timeout configuration.
  • Verify that there's a data point for the LambdaUserError metric.
  • Verify that the error_reason field in the load balancer access log is set to LambdaUnhandled.

The target is a Lambda function that returned an error, or the function was throttled by the Lambda service

  • Verify that there's data point for the Throttles metric.
  • Contact AWS Support for guidance on service throttling.

Perform a packet capture

For Linux, use the following command:

sudo tcpdump -i any -w filename.pcap

For Windows, download and use the Wireshark application (from the Wireshark website).

For instructions to test packet capture samples with tcpdump or take a capture packet, see How do I troubleshoot network performance issues between EC2 Linux or Windows instances in an Amazon Virtual Private Cloud and an on-premises host over the internet gateway?

AWS OFFICIAL
AWS OFFICIALUpdated 2 days ago