package
0.11.0
Repository: https://github.com/raystack/meteor.git
Documentation: pkg.go.dev

# README

bigquery

Usage

source:
  name: bigquery
  config:
    project_id: google-project-id
    table_pattern: gofood.fact_
    max_preview_rows: 3
    exclude:
      datasets:
        - dataset_a
        - dataset_b
      tables:
        - dataset_c.table_a
    max_page_size: 100
    profile_column: true
    build_view_lineage: true
    # Only one of service_account_base64 / service_account_json is needed.
    # If both are present, service_account_base64 takes precedence
    service_account_base64: _________BASE64_ENCODED_SERVICE_ACCOUNT_________________
    service_account_json:
      {
        "type": "service_account",
        "private_key_id": "xxxxxxx",
        "private_key": "xxxxxxx",
        "client_email": "xxxxxxx",
        "client_id": "xxxxxxx",
        "auth_uri": "https://accounts.google.com/o/oauth2/auth",
        "token_uri": "https://oauth2.googleapis.com/token",
        "auth_provider_x509_cert_url": "xxxxxxx",
        "client_x509_cert_url": "xxxxxxx",
      }
    collect_table_usage: false
    usage_period_in_day: 7
    usage_project_ids:
      - google-project-id
      - other-google-project-id

Inputs

KeyValueExampleDescription
project_idstringmy-projectBigQuery Project IDrequired
service_account_base64string____BASE64_ENCODED_SERVICE_ACCOUNT____Service Account in base64 encoded string. Takes precedence over service_account_json valueoptional
service_account_jsonstring{"private_key": .., "private_id": ...}Service Account in JSON stringoptional
table_patternstringgofood.fact_Regex pattern to filter which bigquery table to scan (whitelist)optional
max_page_sizeint100max page size hint used for fetching datasets/tables/rows from bigqueryoptional
include_column_profilebooltruetrue if you want to profile the column value such min, max, med, avg, top, and freqoptional
max_preview_rowsint30max number of preview rows to fetch, 0 will skip preview fetching, -1 will restrict adding preview_rows key in asset data . Default to 30.optional
mix_valuesboolfalsetrue if you want to mix the column values with the preview rows. Default to false.optional
collect_table_usagebooleanfalsetoggle feature to collect table usage, true will enable collecting table usage. Default to false.optional
usage_period_in_dayint7collecting log from (now - usage_period_in_day) until now. only matter if collect_table_usage is true. Default to 7.optional
usage_project_ids[]string[google-project-id, other-google-project-id]collecting log from defined GCP Project IDs. Default to BigQuery Project ID.optional

Notes

  • Leaving service_account_json and service_account_base64 blank will default to Google's default authentication. It is recommended if Meteor instance runs inside the same Google Cloud environment as the BigQuery project.
  • Service account needs to have bigquery.privateLogsViewer role to be able to collect bigquery audit logs.
  • Setting max_preview_rows to -1 will restrict adding preview_rows key in asset data

Outputs

FieldSample ValueDescription
resource.urnproject_id.dataset_name.table_name
resource.nametable_name
resource.servicebigquery
descriptiontable description
profile.total_rows2100
profile.usage_count15
profile.joins[]Join
profile.filters["WHERE t.param_3 = 'the_param' AND t.column_1 = \"xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx\"","WHERE event_timestamp >= TIMESTAMP(\"2021-10-29\", \"UTC\") AND event_timestamp < TIMESTAMP(\"2021-11-22T02:01:06Z\")"]
schema[]Column
properties.partition_data"partition_data": {"partition_field": "data_date", "require_partition_filter": false, "time_partition": {"partition_by": "DAY","partition_expire": 0 } }partition related data for time and range partitioning.
properties.clustering_fields['created_at', 'updated_at']list of fields on which the table is clustered
properties.partition_fieldcreated_atreturns the field on which table is time partitioned

Partition Data

FieldSample ValueDescription
partition_fieldcreated_atfield on which the table is partitioned either by TimePartitioning or RangePartitioning. In case field is empty for TimePartitioning _PARTITIONTIME is returned instead of empty.
require_partition_filtertrueboolean value which denotes if every query on the bigquery table must include at least one predicate that only references the partitioning column
time_partition.partition_byHOURreturns partition type HOUR/DAY/MONTH/YEAR
time_partition.partition_expire_seconds0time in which data will expire from this partition. If 0 it will not expire.
range_partition.interval10width of a interval range
range_partition.start0start value for partition inclusive of this value
range_partition.end100end value for partition exclusive of this value

Column

FieldSample Value
nametotal_price
descriptionitem's total price
data_typedecimal
is_nullabletrue
length12,2
profile{"min":...,"max": ...,"unique": ...}

Join

FieldSample Value
urnproject_id.dataset_name.table_name
count3
conditions["ON target.column_1 = source.column_1 and target.param_name = source.param_name","ON DATE(target.event_timestamp) = DATE(source.event_timestamp)"]

Contributing

Refer to the contribution guidelines for information on contributing to this module.

# Packages

No description provided by the author
No description provided by the author
No description provided by the author

# Functions

CreateClient creates a bigquery client.
No description provided by the author
No description provided by the author
No description provided by the author

# Structs

Config holds the set of configuration for the bigquery extractor.
No description provided by the author
Extractor manages the communication with the bigquery service.

# Type aliases

No description provided by the author