Prerequisites
Before you run this script, you need to ensure that your environment is set up with the following requirements:- Python 3.6 or higher - Make sure Python is installed on your system.
- Google Cloud SDK - Install and configure the Google Cloud SDK (gcloud).
- BigQuery API Access - Ensure that the BigQuery API is enabled in your Google Cloud project.
- Google Cloud Authentication - Set up authentication by downloading a service account key and setting the
GOOGLE_APPLICATION_CREDENTIALSenvironment variable pointing to that key file.
Setup
To run this script, you will need to install its dependencies. It is recommended to use a virtual environment at the top level of the repo:Usage
Script Parameters
The script accepts several command line arguments to specify the details required for loading data into BigQuery:--project_id: Required. The Google Cloud project ID where your BigQuery dataset resides.--dataset_id: The ID of the BigQuery dataset. Defaults toexplore_assistant.--table_id: The ID of the BigQuery table where the data will be inserted. Defaults toexplore_assistant_examples.--explore_id: Required. A unique identifier for the dataset rows related to a specific use case or query (used in deletion and insertion).--json_file: The path to the JSON file containing the data to be loaded. Defaults toexamples.json.
Running the Script
To run the script, use the following command format in your terminal: Load the general examples:Description
This Python script is designed to manage data uploads from a JSON file into a Google BigQuery table, particularly focusing on scenarios where specific entries identified by anexplore_id need to be refreshed or updated in the dataset.
-
Command Line Interface (CLI):
- The script uses
argparseto define and handle command line inputs that specify the Google Cloud project, dataset, and table details, as well as the path to the JSON data file.
- The script uses
-
BigQuery Client Initialization:
- It initializes a BigQuery client using the Google Cloud project ID provided through the CLI. This client facilitates interactions with BigQuery, such as running queries and managing data.
-
Data Deletion:
- Before inserting new data, the script deletes existing rows in the specified BigQuery table that match the given
explore_id. This is crucial for use cases where the data associated with anexplore_idneeds to be refreshed or updated without duplication.
- Before inserting new data, the script deletes existing rows in the specified BigQuery table that match the given
-
Data Loading from JSON:
- The script reads data from a specified JSON file. This data is expected to be in a format that BigQuery can ingest.
-
Data Insertion into BigQuery:
- After deletion of old data, the script inserts the new data from the JSON file into the BigQuery table. It constructs a SQL
INSERTstatement and executes it using the BigQuery client. Proper parameterization of the query is utilized to safeguard against SQL injection.
- After deletion of old data, the script inserts the new data from the JSON file into the BigQuery table. It constructs a SQL
-
Error Handling:
- Throughout the data deletion and insertion processes, the script checks for and reports any errors that occur. This is vital for debugging and ensuring data integrity.

