Why is my AWS Glue workflow not triggered?

5 minute read
0

I created an AWS Glue workflow, but it's not starting. Or some of the constituent jobs or crawlers in my AWS Glue workflow are not running.

Short description

If your AWS Glue workflow or its components aren't triggered, then confirm the following:

  • If a scheduled trigger is used as the source trigger, then confirm that it's activated. Be sure that the schedule is mentioned in Coordinated Universal Time (UTC), and that the cron expression includes all of the required fields.
  • If an event-based trigger is used as the source trigger, then confirm that the Amazon EventBridge rule is correct, turned on, and linked to the workflow. Confirm that the role that's used to add the workflow as a target has the necessary permissions to pass events to AWS Glue.
  • If an external component is triggering the source trigger, then confirm that the external component isn't malfunctioning.
  • Be sure that the predicate condition used to trigger a component isn't met by an agent external to the workflow.
  • The component might be part of a dependency chain. Be sure that the upstream jobs/crawlers are started as part of the same workflow by a single source trigger.

Resolution

Workflow not starting with a time-based trigger

If the workflow's source trigger is scheduled, then check the following:

  • Be sure that the trigger is in the ACTIVATED state and not in the CREATED state. If the trigger isn't in the ACTIVATED state, then activate the trigger manually.
  • Be sure that the cron expression used in the schedule for a scheduled trigger is in UTC. Be sure that the fields in the cron expression correspond with the conversion of the local time zone to UTC. Also, check if the cron expression includes all the required fields in the correct format. For more information, see Time-based schedules for jobs and crawlers.

Workflow not starting with an on-demand trigger

Your source trigger might be on-demand, with an upstream entity triggering it using the StartWorkflowRun API call. For this use case, be sure that the calling entity functions correctly.

Workflow not starting with a conditional trigger

Be sure that the predicate conditions in the trigger aren't met by an agent external to the workflow. If the conditions are met by an external agent, then the trigger isn't fired. Conditional triggers are started only if the watched event is started by a trigger.

For example, suppose that the following conditions are true:

  • You have a workflow with a job JOB_MAIN that's triggered by the trigger TEST_TR.
  • The trigger TEST_TR is dependent on the completion of another job JOB_DEP that's not part of the current workflow.

In this case, even if JOB_DEP completes successfully, and the trigger TEST_TR's predicate logic is met, the job JOB_MAIN isn't fired. The job isn't fired because the predicate condition is met by an agent that's not part of the same workflow.

Workflow not starting with an event-based trigger

You might have configured EventBridge (also known as Amazon CloudWatch Events) to start your workflows. If you are using this setup, and notice that the source trigger is not being fired, then check the following:

  • If you're using Amazon Simple Storage Service (Amazon S3) data events as the source, confirm that you've properly created a trail for the AWS account. Then check that the events for the concerned S3 bucket are being logged in AWS CloudTrail.
  • Verify the EventBridge rule to confirm the eventSource, eventName, and requestParameters are as expected.
  • Confirm that the EventBridge rule is attached to the correct workflow target ARN.
  • Confirm that the EventBridge rule state is ENABLED.
  • Confirm that the RoleArn used to attach the rule to the workflow target has CloudWatchEventsBuiltInTargetExecutionAccess and CloudWatchEventsInvocationAccess policies attached. Then, confirm that there are sufficient IAM permissions to do 'glue:NotifyEvent' on the workflow. The following example shows the minimum required AWS Glue policy:
{
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Action": ["glue:NotifyEvent"],
        "Resource": ["arn:aws:glue:<region>:<account-id>:workflow/<workflow-name>"]
    }]
}

Workflow not starting for a component job or crawler that's part of a dependency chain

Check if the constituent job or crawler depends on the completion of an upstream job or crawler that's also started by a trigger. A dependent job or crawler is started only if the job or crawler that's completed was started by a trigger. Make sure that all jobs or crawlers in a dependency chain are descendants of a single scheduled or on-demand trigger.

For example, suppose that these conditions are true:

  • Your workflow starts with a trigger named TEST_TR1 that starts the job named JOB_1.
  • Another trigger named TEST_TR2 depends on the completion of JOB_1 to start the job named JOB_2.

In this case, TEST_TR2 starts JOB_2 when the predicate conditions for TEST_TR2 are met.

But if JOB_1 is run on-demand, and not started by TEST_TR1, then TEST_TR2 doesn't start JOB_2 even if the predicate conditions for TEST_TR2 are met.


Related information

AWS Glue triggers

Workflow restrictions in AWS Glue

Starting an AWS Glue workflow with an Amazon EventBridge event

AWS OFFICIAL
AWS OFFICIALUpdated 3 years ago