Why is my MySQL DB instance showing a high number of active sessions waiting on SYNCH wait events in Performance Insights?

5 minute read
1

When I activate Performance Insights, my DB instance shows a large number of Average Active Sessions (AAS) waiting on synchronization (SYNCH) wait events. I want to improve the performance of my DB instance.

Short description

Performance Insights are activated on any of the following services:

  • Amazon Relational Database Service (Amazon RDS) for MySQL.
  • Amazon RDS for MariaDB.
  • Amazon Aurora MySQL-Compatible Edition.

If you see MySQL SYNCH wait events in Performance Insights, then a large number of sessions in the database are attempting to access the same protected objects or memory structures. Protected objects in MySQL include the following:

  • The active binary log file in a binlog source instance - contains a mutex that allows only one session to read or write it at a time.
  • The data dictionary - for writes that are usually caused by data control language (DCL) or data definition language (DDL) statements.
  • The adaptive hash index- contains a mutex that allows only one session to read or write it at a time.
  • The open table cache - only one session can add or remove a table from the cache.
  • Each single database block inside the InnoDB Buffer Pool - only one session can modify the content of a block in memory at a time.

Resolution

Make sure that the DB instance has enough CPU resources to handle the workload

If you have a high number of sessions waiting on SYNCH events, then this causes high CPU usage. If the usage hits 100%, then the number of waiting sessions increases. When troubleshooting, increase the size of your DB instance to make sure that there is enough CPU to process the extra workload.

Because these events are usually short-lived, the Amazon CloudWatch CPU utilization metric might not show the peak usage correctly. The best way to check this is to use the one-second CPU counters in RDS Enhanced Monitoring. These counters are more specific and detailed.

Increase MySQL's mutex/lock wait array

MySQL uses an internal data structure to coordinate threads. This array has a size of one, by default. This is suitable for single-CPU machines, but it can cause issues on machines with several CPUs. If your workload has a large number of waiting threads, then increase the array size. Set the MYSQL parameter innodb_sync_array_size to the amount of CPUs (or higher, up to 1024).

Note: The innodb_sync_array_size parameter applies only at database start up.

Reduce concurrency

In general, parallelism helps to improve throughput. But when a large number of sessions try to do the same or similar activities, then the sessions need access to the same protected objects. The higher the number of sessions, the more CPU you use while waiting.

Spread these activities over time, or schedule them in series. You can also bundle several operations into a single statement, such as multi-row inserts.

Examine specific wait events

Use the following examples to troubleshoot your specific wait event. For more information on Aurora MySQL wait events, see Tuning Aurora MySQL with wait events.

  • synch/rwlock/innodb/dict sys RW lock, OR
    synch/rwlock/innodb/dict_operation_lock - This indicates a high number of concurrent DCLs of DDLs are triggered at the same time. Reduce the application's dependency on using DDLs during regular application activity.
  • synch/cond/sql/MDL_context::COND_wait_status - This indicates a high number of SQLs (including selects) trying to access a table that a DCL or DDL is modifying. Avoid running DDL statements to high-traffic tables during regular application activity.
  • synch/mutex/sql/LOCK_open, OR
    synch/mutex/sql/LOCK_table_cache - This indicates that the number of tables that your sessions are opening exceeds the size of the table definition cache or the table open cache. Increase the size of these caches.
  • synch/mutex/sql/LOG - Your database might be executing a large number of statements, and the current logging methods can't support it. If you use the TABLE output method, then try to use FILE instead. If you use general log, then use Amazon Aurora's advanced auditing instead. If you use 0 or less than 1 for the long_query_time parameter, then try to increase it.
  • synch/mutex/innodb/buf_pool_mutex,OR synch/mutex/innodb/aurora_lock_thread_slot_futex,OR synch/rwlock/innodb/index_tree_rw_lock - There are a large number of similar DMLs accessing the same database object at the same time. Use multi-row statements, and use partitioning to spread the workload over different database objects.
  • synch/mutex/innodb/aurora_lock_thread_slot_futex - This occurs when one session has locked a row for an update, and then another session tries to update the same row. Your action depends on the other wait events that you see. Either find and respond to the SQL statements responsible for this wait event, or find and respond to the blocking session.
  • synch/cond/sql/MYSQL_BIN_LOG::COND_done, OR
    synch/mutex/sql/MYSQL_BIN_LOG::LOCK_commit, OR
    synch/mutex/sql/MYSQL_BIN_LOG::LOCK_log - You have turned on binary logging, and there might be one of the following:

                         - a high commit throughput.

                         - a large number of transactions committing.

                         - replicas reading binlogs.

                         - a combination of these.

Consider upgrading the database to a major version compatible with 5.7 or higher. Also, use multi-row statements, or bundle several statements into a single transaction. In Amazon Aurora, use global databases instead of binary log replication, or use the aurora_binlog parameters.


Related information

Using Amazon RDS performance insights

Working with DB parameter groups

Aurora MySQL events

AWS OFFICIAL
AWS OFFICIALUpdated a year ago