This script facilitates the loading of JSON data into Google BigQuery while managing data freshness by ensuring existing rows related to an explore_id
are deleted before new data is inserted.
GOOGLE_APPLICATION_CREDENTIALS
environment variable pointing to that key file.--project_id
: Required. The Google Cloud project ID where your BigQuery dataset resides.--dataset_id
: The ID of the BigQuery dataset. Defaults to explore_assistant
.--table_id
: The ID of the BigQuery table where the data will be inserted. Defaults to explore_assistant_examples
.--explore_id
: Required. A unique identifier for the dataset rows related to a specific use case or query (used in deletion and insertion).--json_file
: The path to the JSON file containing the data to be loaded. Defaults to examples.json
.explore_id
need to be refreshed or updated in the dataset.
argparse
to define and handle command line inputs that specify the Google Cloud project, dataset, and table details, as well as the path to the JSON data file.explore_id
. This is crucial for use cases where the data associated with an explore_id
needs to be refreshed or updated without duplication.INSERT
statement and executes it using the BigQuery client. Proper parameterization of the query is utilized to safeguard against SQL injection.