Why was my query canceled in Amazon Redshift?

6 minute read
0

My query in Amazon Redshift was canceled with an error message.

Short description

A query can be canceled in Amazon Redshift for the following reasons:

  • Setup of Amazon Redshift workload management (WLM) query monitoring rules
  • Statement timeout value
  • ABORT, CANCEL, or TERMINATE requests
  • Network issues
  • Cluster maintenance upgrades
  • Internal processing errors
  • ASSERT errors

To prevent your query from being stopped, complete the following steps:

  • Increase your timeout parameter.
  • Update your WLM QMR rules.
  • Schedule long-running operations outside of maintenance windows.

Resolution

Setup of Amazon Redshift WLM query monitoring rules

Create WLM query monitoring rules (QMRs) to define metrics-based performance boundaries for your queues. Or, specify the actions that Amazon Redshift takes when a query exceeds the WLM time limits. For example, create a rule that cancels queries that run for more than a 60-second threshold.

Example 1: Abort action specified in the query monitoring rule

If a query is canceled because of the abort action that's specified in a query monitoring rule, then the query returns the following error:

"ERROR: Query (500029) cancelled by WLM abort action of Query Monitoring Rule "testrule"."

To identify whether a query was canceled because of an "abort" action, run the following query:

select * from STL_WLM_RULE_ACTION where action = 'abort';

The query output lists all queries that are canceled by the "abort" action. If your query ID is listed in the output, then increase the time limit in the WLM QMR parameter.

Example 2: No available queues for the query to be hopped

A query can be hopped if the "hop" action is specified in the query monitoring rule. When a query is hopped, WLM attempts to route the query to the next matching queue based on the WLM queue assignment rules. If the query doesn't match a queue definition, then the query is canceled. A canceled query isn't reassigned to the default queue. For more information, see Properties for the wlm_json_configuration parameter.

Note: You can hop queries only in a manual WLM configuration.

If a query is hopped but no matching queues are available, then the canceled query returns the following error message:

"ERROR: Query (500104) canceled on user's request and ran out of wlm queues for restart."

If your query is canceled with this error message, then run the following query to check the user-defined queues:

select * from stl_wlm_query where query=<query-id>;

In your output, the service_class entries 6-13 include the user-defined queues. For example, service_class 6 might list Queue1 in the WLM configuration, and service_class 7 might list Queue2.

Run the following query for more information on the service_class to queue mapping:

select * from stv_wlm_service_class_config where service_class>5;

After you get the queue mapping information, check the WLM configuration from the Amazon Redshift console. Verify that the queues match the WLM configuration. You can only hop a query if there's a matching queue available for the user group or query group configuration. For more information, see WLM query queue hopping.

Statement timeout value

The statement_timeout value is the maximum amount of time that a query runs before Amazon Redshift terminates it. When a statement timeout is exceeded, then queries submitted during the session are canceled with the following error message:

"ERROR: Query (150) cancelled on user's request"

To verify whether a query was canceled because of a statement timeout, run the following query:

select * from SVL_STATEMENTTEXT where text ilike '%set%statement_timeout%to%' and pid in (select pid from STL_QUERY where query = <queryid>);

Statement timeouts can also be set in the cluster parameter group. Check your cluster parameter group and any statement_timeout configuration settings for additional confirmation. For more information, see Modifying a parameter group.

ABORT, CANCEL, or TERMINATE requests

To check if a particular query was stopped or canceled by a user (such as a superuser), run the following query with your query ID:

select * from SVL_STATEMENTTEXT where text ilike '%cancel%' and xid 
    in (select xid from STL_QUERY where query = <queryid>);
select * from SVL_STATEMENTTEXT where text ilike '%abort%' and xid in (select xid from STL_QUERY where query = <queryid>);

If the query appears in the output, then the query was either stopped or canceled upon user request.

Note: Users can terminate only their own session. A superuser can terminate all sessions.

Queries can also be stopped when a user cancels or terminates a corresponding process (where the query is being run). The following are examples of processes that can cancel or terminate a query:

When a process is canceled or terminated by these commands, an entry is logged in SVL_TERMINATE. To confirm whether a query was stopped because a session was terminated, check the SVL_TERMINATE logs. Run the following query to check the SVL_TERMINATE logs:

select * from SVL_TERMINATE where pid=(select pid from STL_QUERY where query=500534);

Network issues

Sometimes queries are canceled because of underlying network issues. To verify whether network issues are causing your query to cancel, run the following query to check the STL_CONNECTION_LOG entries:

select * from STL_CONNECTION_LOG where pid in (select pid from STL_QUERY where query = <query_id>);

The STL_CONNECTION_LOG records authentication attempts and network connections or disconnections. If your query appears in the output, then a network connection issue might be causing your query to cancel.

Cluster maintenance upgrades

If a scheduled maintenance occurs when a query is running, then the query is terminated and rolled back, requiring a cluster reboot. Schedule long-running operations (such as large data loads or the VACUUM operation) to avoid maintenance windows. For more information, see Schedule around maintenance windows.

To check if maintenance was performed on your Amazon Redshift cluster, choose the Events tab in your Amazon Redshift console.

Internal processing errors

The STL_ERROR table records internal processing errors generated by Amazon Redshift. The STL_ERROR table doesn't record SQL errors or messages.

To verify whether your query was canceled by an internal error, run the following query to check the STL_ERROR entries:

select * from STL_ERROR where userid=<user id>;

ASSERT errors

Sometimes queries are stopped because of an ASSERT error. The ASSERT error can occur when there's an issue with the query itself. If you get an ASSERT error after a patch upgrade, then update Amazon Redshift to the newest cluster version. Then, check the cluster version history. Or, you can roll back the cluster version.

AWS OFFICIAL
AWS OFFICIALUpdated 10 months ago
No comments