# README
bigquery
Usage
source:
name: bigquery
config:
project_id: google-project-id
table_pattern: gofood.fact_
max_preview_rows: 3
exclude:
datasets:
- dataset_a
- dataset_b
tables:
- dataset_c.table_a
max_page_size: 100
profile_column: true
build_view_lineage: true
# Only one of service_account_base64 / service_account_json is needed.
# If both are present, service_account_base64 takes precedence
service_account_base64: _________BASE64_ENCODED_SERVICE_ACCOUNT_________________
service_account_json:
{
"type": "service_account",
"private_key_id": "xxxxxxx",
"private_key": "xxxxxxx",
"client_email": "xxxxxxx",
"client_id": "xxxxxxx",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "xxxxxxx",
"client_x509_cert_url": "xxxxxxx",
}
collect_table_usage: false
usage_period_in_day: 7
usage_project_ids:
- google-project-id
- other-google-project-id
Inputs
Key | Value | Example | Description | |
---|---|---|---|---|
project_id | string | my-project | BigQuery Project ID | required |
service_account_base64 | string | ____BASE64_ENCODED_SERVICE_ACCOUNT____ | Service Account in base64 encoded string. Takes precedence over service_account_json value | optional |
service_account_json | string | {"private_key": .., "private_id": ...} | Service Account in JSON string | optional |
table_pattern | string | gofood.fact_ | Regex pattern to filter which bigquery table to scan (whitelist) | optional |
max_page_size | int | 100 | max page size hint used for fetching datasets/tables/rows from bigquery | optional |
include_column_profile | bool | true | true if you want to profile the column value such min, max, med, avg, top, and freq | optional |
max_preview_rows | int | 30 | max number of preview rows to fetch, 0 will skip preview fetching, -1 will restrict adding preview_rows key in asset data . Default to 30 . | optional |
mix_values | bool | false | true if you want to mix the column values with the preview rows. Default to false . | optional |
collect_table_usage | boolean | false | toggle feature to collect table usage, true will enable collecting table usage. Default to false . | optional |
usage_period_in_day | int | 7 | collecting log from (now - usage_period_in_day) until now . only matter if collect_table_usage is true. Default to 7 . | optional |
usage_project_ids | []string | [google-project-id, other-google-project-id] | collecting log from defined GCP Project IDs. Default to BigQuery Project ID. | optional |
Notes
- Leaving
service_account_json
andservice_account_base64
blank will default to Google's default authentication. It is recommended if Meteor instance runs inside the same Google Cloud environment as the BigQuery project. - Service account needs to have
bigquery.privateLogsViewer
role to be able to collect bigquery audit logs. - Setting
max_preview_rows
to-1
will restrict adding preview_rows key in asset data
Outputs
Field | Sample Value | Description |
---|---|---|
resource.urn | project_id.dataset_name.table_name | |
resource.name | table_name | |
resource.service | bigquery | |
description | table description | |
profile.total_rows | 2100 | |
profile.usage_count | 15 | |
profile.joins | []Join | |
profile.filters | ["WHERE t.param_3 = 'the_param' AND t.column_1 = \"xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx\"" ,"WHERE event_timestamp >= TIMESTAMP(\"2021-10-29\", \"UTC\") AND event_timestamp < TIMESTAMP(\"2021-11-22T02:01:06Z\")" ] | |
schema | []Column | |
properties.partition_data | "partition_data": {"partition_field": "data_date", "require_partition_filter": false, "time_partition": {"partition_by": "DAY","partition_expire": 0 } } | partition related data for time and range partitioning. |
properties.clustering_fields | ['created_at', 'updated_at'] | list of fields on which the table is clustered |
properties.partition_field | created_at | returns the field on which table is time partitioned |
Partition Data
Field | Sample Value | Description |
---|---|---|
partition_field | created_at | field on which the table is partitioned either by TimePartitioning or RangePartitioning. In case field is empty for TimePartitioning _PARTITIONTIME is returned instead of empty. |
require_partition_filter | true | boolean value which denotes if every query on the bigquery table must include at least one predicate that only references the partitioning column |
time_partition.partition_by | HOUR | returns partition type HOUR/DAY/MONTH/YEAR |
time_partition.partition_expire_seconds | 0 | time in which data will expire from this partition. If 0 it will not expire. |
range_partition.interval | 10 | width of a interval range |
range_partition.start | 0 | start value for partition inclusive of this value |
range_partition.end | 100 | end value for partition exclusive of this value |
Column
Field | Sample Value |
---|---|
name | total_price |
description | item's total price |
data_type | decimal |
is_nullable | true |
length | 12,2 |
profile | {"min":...,"max": ...,"unique": ...} |
Join
Field | Sample Value |
---|---|
urn | project_id.dataset_name.table_name |
count | 3 |
conditions | ["ON target.column_1 = source.column_1 and target.param_name = source.param_name" ,"ON DATE(target.event_timestamp) = DATE(source.event_timestamp)" ] |
Contributing
Refer to the contribution guidelines for information on contributing to this module.
# Functions
CreateClient creates a bigquery client.
No description provided by the author
No description provided by the author
No description provided by the author
# Type aliases
No description provided by the author