Why do I see write latency spikes every five minutes after I upgrade my Amazon RDS for PostgreSQL instance to version 11 or higher?

4 minute read
0

I upgraded my Amazon Relational Database Service (Amazon RDS) for PostgreSQL to version 11 or higher. I see a Write Latency spike in Amazon CloudWatch every five minutes.

Resolution

With an idle Amazon RDS for PostgreSQL instance, you might notice a spike in the Amazon CloudWatch metric Write Latency every five minutes. This correlates with a spike in the Enhanced Monitoring metric Write Total to approximately 64 MB after you upgrade to PostgreSQL version 11 or later. The value of the wal_segment_size parameter becomes 64 MB after the upgrade. These spikes might not be noticeable with version 10 or earlier because the value of wal_segment_size is 16 MB. For more information, see the update for September 7, 2018 in Amazon RDS - Document history.

The archive_timeout configuration in Amazon RDS for PostgreSQL is set to 5 minutes. With this setting, the archival process copies the Write-Ahead Logging (WAL) segments to be archived to Amazon Simple Storage Service (Amazon S3) every five minutes. In an idle system, this copy process is usually the only I/O operation, and so this operation might be visibly dominant in the CloudWatch graphs. However, you might not see this pattern in a busy system. For example, suppose that you run the following workload for 30 minutes on a db.t3.small instance. This example simulates a busy system on an RDS for PostgreSQL instance with a 20 GB Amazon Elastic Block Store (Amazon EBS) volume:

#pgbench --host=$HOST --username=$USER --port=$PORT --protocol=simple  --progress=2  --client=1 --jobs=1 $DB -T 1800
#pgbench --initialize --fillfactor=90 --scale=100  --port=$PORT --host=$HOST --username=$USER $DB

With this workload, you don't see spikes in the CloudWatch Write Latency metric.

But suppose that you have the following use cases:

  • You have an I/O request that takes 10 milliseconds to complete, and nine other I/O requests that take 1 millisecond each to complete. The average latency of these requests is calculated as follows:
    Average latency = (10 + (9 * 1)) / 10 = 1.9 milliseconds
  • You have an I/O request that takes 10 milliseconds to complete, and you have no other I/O requests. The average latency in this case is calculated as follows:
    Average latency = 10 / 1 = 10 milliseconds

Both use cases include the same I/O request that takes 10 milliseconds to complete. However, when you calculate the average latency, the slow request stands out. This is because the second use case has fewer I/O requests that run along with the slow request. If an idle system has one slow I/O request because of the high block size, then the latency is calculated from only this request. In a busy system with multiple I/O requests, most with smaller block sizes, the average latency is calculated from all these requests.

In these cases, you might see a Write Latency spike every five minutes in an idle Amazon RDS for PostgreSQL system. If your database instance runs a busy workload, then you might not see these spikes.

Example: Suppose that you have two t2.small Amazon Elastic Compute Cloud (Amazon EC2) instances each with eight gp2 volumes.

Run the following command on crontab. The command creates a 64 MB file with a block size of 64 MB every five minutes in the first Amazon EC2 instance:

*/5 * * * * dd if=/dev/zero of=/tmp/big_file.txt bs=64MB count=1 oflag=dsync ; rm -rf /tmp/big_file.txt

Note: Replace big_file.txt in the command with the name of your file.

Also, run the following command on crontab. The command creates 100 files each with a block size of 8 KB every one minute in the second Amazon EC2 instance:

* * * * * for i in {1..100} ; do dd if=/dev/zero of=/tmp/small_file.txt bs=8k count=100 oflag=dsync ; rm -rf /tmp/small_file.txt ; done

Note: Replace small_file.txt in the command with the name of your file.

You might notice that the second EC2 instance shows higher Writer Latency spikes in CloudWatch than the first EC2 instance.

Related information

Best practices for working with PostgreSQL