Why did DB connections drop on my RDS DB instance?

5 minute read
0

My Amazon Relational Database Service (Amazon RDS) database connections dropped suddenly, which caused unexpected downtime. Why did my DB connections drop?

Resolution

Amazon RDS DB connections can drop for various reasons. To understand the cause of the drop in your DB connections, determine whether the DB connections dropped during or outside of the maintenance window for your RDS DB instance.

If DB connections drop during the RDS maintenance window

During the DB instance maintenance window, AWS performs maintenance activities that can lead to dropped DB connections.

Auto Minor Version Upgrade (if enabled on Amazon RDS)

When Amazon RDS designates a new preferred minor engine version and your DB instance is running an earlier version, Amazon RDS performs an upgrade during the scheduled maintenance window if you have the Auto Minor Version Upgrade feature turned on. This can lead to dropped DB connections during the minor version upgrade as any engine-level version upgrade involves RDS downtime.

Hardware maintenance

Amazon RDS schedules hardware maintenance when the underlying hosts of your DB instances are running on degraded hardware. The hardware maintenance is performed during the maintenance window configured for the DB instances. Before maintenance is scheduled, you receive an email notification about scheduled hardware maintenance windows that includes the time of the maintenance and the Availability Zones that are affected.

Operating system maintenance

Amazon RDS periodically performs updates of the underlying operating system during the maintenance window configured for your DB instance. If the operating system update involves downtime, Amazon RDS schedules the maintenance for the next maintenance window. If the operating system update isn't required maintenance, you can postpone the maintenance window by adjusting the preferred maintenance window. If maintenance is required, the operating system update can't be postponed, and the update is applied during the subsequent maintenance window.

Modifications performed on Amazon RDS by selecting "Apply in Next Maintenance Window"

When performing any modifications of your RDS configuration, you can choose whether you want to apply the modifications immediately or In the next maintenance window. If you choose to perform the modifications in the next maintenance window, then there is no immediate downtime. The following modifications might cause downtime when applied during the next maintenance window:

  • Renaming DB instance identifiers
  • Modifying DB instance classes
  • Changing the backup retention periods
  • Modifying DB ports
  • Changing the DB engine version
  • Attaching a new subnet group

Refer to DB instances settings information to understand the detailed settings available for modification along with the impact and downtime of DB instances.

If DB connections dropped outside the RDS maintenance window

DB connections might drop if the DB connections reach the client/server side timeout.

Client timeout parameters configured at the application end

Client timeout parameters configured at the application end can lead to DB connections drop. If the processing time of a query is too long, then the session might terminate incorrectly from the client. To resolve this issue, increase the client's timeout setting.

Server timeout parameters configured in the custom parameter group attached to Amazon RDS

Aggressively set TCP keepalives lead to client connection timeouts. Timeouts occur when the client is idle for the amount of time set in tcp_keepalives_idle and the number of messages set in tcp_keepalives_count. A timeout can also occur when a connection is waiting for a server response while long-running queries are running on the DB instance.

If the idle_in_transaction_session_timeout is set to a lower value than the default 24 hours, then any session that has been idle for more than the configured value is ended. If you set this value aggressively, even if the queries running require more time to get a response from the server, then the connection drops when the session is idle for longer than the configured timeout value.

Unplanned DB restart/failover

A transient issue with underlying hardware might lead to the loss of communication to the DB instance. A hardware issue might initiate failover in a Multi-AZ deployment and recovery in a Single-AZ deployment by replacing the underlying host. This issue might render the DB instance unhealthy because the RDS monitoring system couldn't communicate with the RDS instance to perform the health checks.

A transient network issue affects the underlying host of your DB Instance. The internal monitoring system detects this issue and proactively initiates recovery for a Single-AZ deployment and failover for Multi-AZ deployments.

The DB instance becomes unresponsive when a high DB load leads to a memory crunch in the database that prevents the RDS monitoring system from contacting the underlying host. To avoid a failover and restart of your DB instance due to database overload, configure the memory parameters on the DB instance appropriately.

A transient issue with the underlying storage subsystem can led to elevated latency for an Amazon Elastic Block Store (Amazon EBS) volume, which is identified by an internal monitoring system. As a proactive measure, the monitoring system initiates recovery for a Single-AZ deployment. In a Multi-AZ deployment, a failover to secondary is performed.


Related information

How do I minimize downtime during required Amazon RDS maintenance?

Working with operating system updates

How do I resolve problems when connecting to my Amazon RDS DB instance?

How do I perform the root cause analysis for a Multi-AZ failover and restart of my Amazon RDS instance?

AWS OFFICIAL
AWS OFFICIALUpdated 2 years ago