Why Glue?

With AWS Glue, you pay an hourly rate, billed by the second, for crawlers (discovering data) and extract, transform, and load (ETL) jobs (processing and loading data). For the AWS Glue Data Catalog, you pay a simplified monthly fee for storing and accessing the metadata. The first million objects stored are free, and the first million accesses are free. If you provision a development endpoint to interactively develop your ETL code, you pay an hourly rate, billed per second. For AWS Glue DataBrew, the interactive sessions are billed per session, and DataBrew jobs are billed per minute. Usage of the AWS Glue Schema Registry is offered at no additional charge.

Note: Pricing can vary by AWS Region.

  • ETL jobs and interactive sessions
  • Pricing examples

    ETL job: Consider an AWS Glue Apache Spark job that runs for 15 minutes and uses 6 DPU. The price of 1 DPU-Hour is $0.44. Since your job ran for 1/4th of an hour and used 6 DPUs, AWS will bill you 6 DPU * 1/4 hour * $0.44, or $0.66.

    AWS Glue Studio Job Notebooks and Interactive Sessions: Suppose you use a notebook in AWS Glue Studio to interactively develop your ETL code. An Interactive Session has 5 DPU by default. If you keep the session running for 24 minutes or 2/5th of an hour, you will be billed for 5 DPUs * 2/5 hour at $0.44 per DPU-Hour or $0.88.

    ML Transforms: Similar to AWS Glue jobs runs, the cost of running ML Transforms, including FindMatches, on your data will vary based on the size of your data, the content of your data, and the number and types of nodes that you use. In the following example, we used FindMatches to integrate points of interest information from multiple data sources. With a dataset size of ~11,000,000 rows (1.6 GB), a size of Label data (examples of true matches or true no-matches) of ~8,000 rows (641 KB), running on 16 instances of type G.2x, then you would have a labelset generation runtime of 34 minutes at a cost of $8.23, a metrics estimation runtime of 11 minutes at a cost of $2.66, and a FindingMatches job execution runtime of 32 minutes at a cost of $7.75.

  • Data Catalog
  • Pricing examples

    AWS Glue Data Catalog free tier: Let’s consider that you store a million tables in your Data Catalog in a given month and make 1 million requests to access these tables. You pay $0 because your usage will be covered under the AWS Glue Data Catalog free tier. You can store the first million objects and make a million requests per month for free.

    AWS Glue Data Catalog: Now consider that your storage usage remains the same at 1 million tables per month, but your requests double to 2 million requests per month. Let’s say you also use crawlers to find new tables, and they run for 30 minutes and consume 2 DPUs.

    Your storage cost is still $0, as the storage for your first million tables is free. Your first million requests are also free. You will be billed for 1 million requests above the free tier, which is $1. Crawlers are billed at $0.44 per DPU-Hour, so you will pay for 2 DPUs * 1/2 hour at $0.44 per DPU-Hour or $0.44.

    If you generate statistics on Glue tables, and the statistics run takes 10 minutes and consume 1 DPUs, you will be billed 1 DPUs * 1/6 hour * $0.44/DPU-Hour, which equals $0.07.

    If you compact Apache Iceberg tables, and the compaction run for 30 minutes and consume 2 DPUs, you will be billed 2 DPUs * 1/2 hour * $0.44/DPU-Hour, which equals $0.44.

  • Crawlers
  • DataBrew interactive sessions
  • Pricing examples

    AWS Glue DataBrew: The price for each 30 minutes interactive session is $1.00. If you start a session at 9:00 AM, immediately leave the console, and return from 9:20 AM–9:30 AM, this will use 1 session for a total of $1.00.

    If you start a session at 9:00 AM and interact with the DataBrew console until 9:50 AM, exit the DataBrew project space, and come back to make your final interaction at 10:15 AM, this will use 3 sessions and you will be billed $1.00 per session for a total of $3.00.

  • DataBrew jobs
  • Pricing examples

    AWS Glue DataBrew: If a DataBrew job runs for 10 minutes and consumes 5 DataBrew nodes, the price will be $0.40. Because your job ran for 1/6th of an hour and consumed 5 nodes, you will be billed 5 nodes * 1/6 hour * $0.48 per node hour for a total of $0.40.

  • Data Quality
  • AWS Glue Data Quality builds confidence in your data by helping you achieve high data quality. It automatically measures, monitors, and manages data quality in your data lakes and pipelines, making it easier to identify missing, stale, or bad data.

    You can access data quality features from the Data Catalog and AWS Glue Studio and through AWS Glue APIs.

    Pricing for managing data quality of datasets cataloged in the Data Catalog:

    You can choose a dataset from the Data Catalog and generate recommendations. This action will create a Recommendation Task for which you will provision data processing units (DPUs). After you get the recommendations, you can modify or add new rules and schedule them. These tasks are called Data Quality Tasks for which you will provision DPUs. You will require a minimum of 2 DPUs with a 1 minute minimum billing duration.

    Pricing for managing data quality of datasets processed on AWS Glue ETL:

    You can also add data quality checks to your ETL jobs to prevent bad data from entering data lakes. These data quality rules will reside within your ETL jobs, resulting in increased runtime or increased DPU consumption. . Alternatively, you can use Flexible execution for non-SLA sensitive workloads.

    Pricing for detecting anomalies in AWS Glue ETL:

    Anomaly detection:
    You will incur 1 DPU per statistic in addition to your ETL job DPUs for the time it takes to detect anomalies. On average, it takes between 10 -20 seconds to detect anomaly for 1 statistic. Let’s assume that you configured two Rules (Rule1: data volume must be greater than 1000 records, Rule2: column counts must be greater than 10) and one Analyzer (Analyzer 1: monitor completeness of a column). This configuration will generate three statistics: row count, column count, and completeness percentage of a column. You will be charged 3 additional DPUs for the time it takes to detect anomalies with a 1 second minimum. See example - 4 for more details.

    Retraining:
    You may want to exclude anomalous job runs or statistics so that the anomaly detection algorithm is accurately predicting subsequent anomalies. To do this, AWS Glue allows you to exclude or include statistics. You will incur 1 DPU to retrain the model for the time it takes to retrain. On an average, retraining takes 10 seconds to 20 minute per statistic. See example 5 for more details.

    Statistics storage:
    There is no charge to store the statistics that are gathered. There is a limit of 100K statistics per account and it will be stored for 2 years.

    Additional charges:
    AWS Glue processes data directly from Amazon Simple Storage Service (Amazon S3). There are no additional storage charges for reading your data with AWS Glue. You are charged standard Amazon S3 rates for storage, requests, and data transfer. Based on your configuration, temporary files, data quality results, and shuffle files are stored in an S3 bucket of your choice and are also billed at standard S3 rates.


    If you use the Data Catalog, you are charged standard Data Catalog rates. For details, choose the Data Catalog storage and requests tab.

    Pricing examples

    Example 1 – Get recommendations for a table in the Data Catalog

    For example, consider a recommendation task with 5 DPUs that completes in 10 minutes. You will pay 5 DPUs * 1/6 hour * $0.44, which equals to $0.37.

    Example 2 – Evaluate data quality of a table in the Data Catalog

    After you review the recommendations, you can edit them if necessary and then schedule the data quality task by provisioning DPUs. For example, consider a data quality evaluation task with 5 DPUs that completes in 20 minutes.
    You will pay 5 DPUs * 1/3 hour * $0.44, which equals $0.73.

    Example 3 – Evaluate data quality in an AWS Glue ETL job

    You can also add these data quality checks to your AWS Glue ETL jobs to prevent bad data from entering your data lakes. You can do this by adding Data Quality Transform on AWS Glue Studio or using AWS Glue APIs within the code that you author in AWS Glue Studio notebooks. Consider an AWS Glue job that runs where data quality rules are configured within the pipeline, which executes 20 minutes (1/3 hour) with 6 DPUs. You will be charged 6 DPUs * 1/3 hour * $0.44, which equals $0.88. Alternatively, you can use Flex, for which you will be charged 6 DPUs * 1/3 hour * $0.29, which equals $0.58.

    Example 4 – Evaluate data quality in an AWS Glue ETL job with Anomaly Detection

    Consider an AWS Glue job that reads data from Amazon S3, transforms data and runs data quality checks before loading to Amazon Redshift. Assume that this pipeline had 10 rules and 10 analyzers resulting in 20 statistics gathered. Also, assume that the extraction, transformation process, loading, statistics gathering, data quality evaluation will take 20 minutes. Without Anomaly Detection enabled, customer will be charged 6 DPUs * 1/3 hour (20 minutes) * $0.44, which equals $0.88 (A). With Anomaly Detection turned on, we will add 1 DPU for every statistic and it will take 15 seconds on an average to detect anomalies. In this example, customer will incur 20 statistics * 1 DPU * 15/3600 (0.0041 hour /statistic) * $0.44 (cost per DPU/hour) = $0.037(B). Their total cost of the job will be $0.88 (A) + $0.037 (B) = $0.917.

    Example 5 – Retraining

    Consider that your Glue job detected an anomaly. You decide to exclude the anomaly from the model so that the anomaly detection algorithm predicts future anomalies accurately. To do this, you can retrain the model by excluding this anomalous statistic. You will incur 1 DPU per statistic for the time it takes to retrain the model. On an average, this can take 15 seconds. In this example, assuming you are excluding 1 data point, you will incur 1 statistic * 1 DPU * 15/3600 (0.0041 hour /statistic) * $0.44 = $0.00185.

Note: Pricing can vary by Region.

View the Global Regions table to learn more about AWS Glue availability.