Listing Thumbnail

    Google BigQuery Connector for AWS Glue

     Info
    Easily connect to Google BigQuery from AWS Glue
    Listing Thumbnail

    Google BigQuery Connector for AWS Glue

     Info

    Overview

    The Google BigQuery Connector for AWS Glue simplifies the process of connecting AWS Glue jobs to extract data from BigQuery, and also load data into BigQuery. This connector provides comprehensive access to BigQuery data, facilitating cloud ETL processes for operational reporting, backup and disaster recovery, data governance, and more.

    Highlights

    • * Connect to Google BigQuery from AWS Glue Jobs * Simplify data extracts from Google BigQuery * Simplify data loads to Google BigQuery

    Details

    Delivery method

    Delivery option
    Glue 3.0
    Glue 1.0/2.0

    Latest version

    Operating system
    Linux

    Pricing

    Google BigQuery Connector for AWS Glue

     Info
    This product is free. Subscriptions have no end date and can be canceled anytime.

    Vendor refund policy

    No Refunds

    Legal

    Vendor terms and conditions

    Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    Glue 3.0

    Supported services: Learn more 
    • Amazon ECS
    • Amazon EKS
    Container image

    Containers are lightweight, portable execution environments that wrap server application software in a filesystem that includes everything it needs to run. Container applications run on supported container runtimes and orchestration services, such as Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). Both eliminate the need for you to install and operate your own container orchestration software by managing and scheduling containers on a scalable cluster of virtual machines.

    Version release notes

    Google BigQuery Connector for AWS Glue 0.24.2.

    • This version is built with spark-bigquery-connector  0.24.2.
    • This version is compatible with AWS Glue 3.0, 2.0 and 1.0.
    • This version supports both read from and write into Google BigQuery.

    Additional details

    Usage instructions

    Please subscribe to the product from AWS Marketplace and Activate the Glue connector from AWS Glue Studio .

    Pre-requisite

    • An account in Google Cloud, specifically a service account that has permissions to Google BigQuery
    • GCP credentials (service_account_json_file)
    • GCS bucket (only for writes)
    • BigQuery dataset (only for writes)
    • AWS Secrets Manager secret (you can create the secret in following steps)

    Create a new secret for Google BigQuery in AWS Secrets Manager

    We create a secret in AWS Secrets Manager to store the Google service account file contents as a base64-encoded string.

    1.Download the service account credentials JSON file from Google Cloud.

    • For base64 encoding, you can use one of the online utilities or system commands to do that. For Linux and Mac, you can use base64 [service_account_json_file] to print the file contents as a base64-encoded string.
    1. On the Secrets Manager console, choose Store a new secret.
    2. For Secret type, select Other type of secret.
    3. Enter your key as credentials and the value as the base64-encoded string.
    4. Leave the rest of the options at their default.
    5. Choose Next.
    6. Give a name to the secret bigquery_credentials.
    7. Follow through the rest of the steps to store the secret.

    Connection options

    You can pass the following options to the connector.

    • parentProject (required): The Google Cloud Project ID of the table
    • dataset(optional unless omitted in table): The BigQuery dataset containing the table.
    • table (required): The BigQuery table in the format [[project:]dataset.]table
    • temporaryGcsBucket (optional. required for writes):

    You can see other available options here: https://github.com/GoogleCloudDataproc/spark-bigquery-connector/tree/0.24.2 

    Spark configurations

    Following Spark configurations are required only for writes into BigQuery.

    • spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
    • spark.hadoop.google.cloud.auth.service.account.json.keyfile=true

    You also need to configure credentials in one of following set of configurations.

    Credential file

    • spark.hadoop.fs.gs.auth.service.account.json.keyfile=credentials.json

    You need to upload credentials.json to your S3 bucket, and set the file path in Referenced files path.

    Private key

    • spark.hadoop.fs.gs.auth.service.account.email= [your-email-extracted-from-service_account_json_file]
    • spark.hadoop.fs.gs.auth.service.account.private.key.id= [your-private-key-id-extracted-from-service_account_json_file]
    • spark.hadoop.fs.gs.auth.service.account.private.key= [your-private-key-body-extracted-from-service_account_json_file]

    You can set these Spark configurations in one of following ways.

    • The param --conf of Glue job parameters
    • The job script using SparkConf

    from pyspark.conf import SparkConf conf = SparkConf() conf.set("spark.hadoop.fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem") conf.set("spark.hadoop.fs.gs.auth.service.account.enable", "true") conf.set("spark.hadoop.google.cloud.auth.service.account.json.keyfile", "credentials.json")

    Support

    Vendor support

    Please allow 24 hours

    AWS infrastructure support

    AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

    Similar products

    Customer reviews

    Ratings and reviews

     Info
    4
    1 ratings
    5 star
    4 star
    3 star
    2 star
    1 star
    0%
    100%
    0%
    0%
    0%
    1 AWS reviews
    |
    2 external reviews
    External reviews are sourced from G2  and are not included in the star rating for this product.
    Preview

    Aws

    Reviewed on Apr 23, 2024
    Purchase verified by AWS

    This is the coolest product ever and it's so useful, and really amazing I appreciate it, so have, it guys

    Mithun M.

    No fuss connectivity to Bigquery from AWS Gglue

    Reviewed on Feb 04, 2023
    Review provided by G2
    What do you like best about the product?
    - Easy to setup the connection from Glue to GCP to get up and running with loading data from Bigquery tables to AWS glue
    - We use it for bringing the GA data amounting to multiple gigabytes and process it using Pyspark in AWS glue
    What do you dislike about the product?
    Haven't faced any issues while setting up or using the AWS Glue Connector for Google BigQuery, but having more details and upto date documentation would be a good way to improve it
    What problems is the product solving and how is that benefiting you?
    For bringing data from Big Query table to AWS glue and process it using AWS glue pyspark and push the processed data to S3 location and then back to another table in Bigquery
    Education Management

    Glue Connector Integrations with BQ

    Reviewed on Apr 08, 2022
    Review provided by G2
    What do you like best about the product?
    AWS glue has been a game changer for me. We've been utilizing the Glue Schema Registries, it provides versioning of schema, which wasn't available when we were dealing with Pub/Sub Schema.
    What do you dislike about the product?
    Unfortunately the Glue client is available only in Java. Particularly for the SerDe operations on our Avro Data.
    What problems is the product solving and how is that benefiting you?
    We use it mostly to provide schemas for our tables in BigQuery. The idea behind using Glue is to inferr avro schema from the data we have from CDC, and move it to BigQuery.
    Recommendations to others considering the product:
    Strong Tool to manage out your metadata!
    View all reviews