package
2.10.0+incompatible
Repository: https://github.com/m-lab/etl.git
Documentation: pkg.go.dev

# README

Schema includes bigquery schema (json) files, and code associated with populating bigquery entities.

Parsers in mlab-sandbox and mlab-staging target datasets that are in the same project. Parsers in mlab-oti target datasets in the measurement-lab project.

Today, all parsers write to the base_tables dataset. Soon, they will be changed to write to the incoming dataset, and deduplicated data will be written to base_tables. See https://github.com/m-lab/etl/issues/387 for updates.

NDT

legacy.json contains the schema downloaded from the existing ndt.all tables. It has been ordered for easy comparison against the new schema.

ndt.json contains the initial schema for the NDT tables. It can be used to create a new table in mlab-sandbox project by invoking:

bq --project_id mlab-sandbox mk --time_partitioning_type=DAY \
    --schema schema/ndt.json -t base_tables.ndt

ndt_delta.json contains another NDT schema, including a repeated "delta" field, intended to contain snapshot deltas. To create a new table:

bq --project_id mlab-sandbox mk --time_partitioning_type=DAY \
    --schema schema/ndt_delta.json -t base_tables.ndt_delta

As of May 2017, there are (still) differences between the legacy and NDT schema that may need to be addressed.

Paris-traceroute

pt.json contains the schema for paris traceroute tables. To create a new table:

bq --project_id mlab-sandbox mk --time_partitioning_type=DAY \
    --schema schema/pt.json -t base_tables.traceroute

Sidestream

ss.json contains the schema for sidestream tables. To create a new table:

bq --project_id mlab-sandbox mk --time_partitioning_type=DAY \
    --schema schema/ss.json -t base_tables.sidestream

Switch - DISCO

switch.json contains the schema for DISCO tables. To create a new table:

bq --project_id mlab-sandbox mk --time_partitioning_type=DAY \
    --schema schema/switch.json -t base_tables.switch

# Functions

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
FindSchemaDocsFor should be used by parser row types to associate bigquery field descriptions with a schema generated from a row type.
No description provided by the author
No description provided by the author
NewWeb100FullRecord creates a web100 value map with all supported fields.
NewWeb100MinimalRecord creates a web100 value map with only the given fields.
NewWeb100Skeleton creates the tree structure, with no leaf fields.

# Structs

AnnotationRow defines the BQ schema using 'Standard Columns' conventions for the annotation datatype produced by the uuid-annotator.
No description provided by the author
BQScamperLinkArray defines an array of ScamperLinks.
BQScamperNode describes a layer of links.
BQScamperOutput encapsulates the four lines of a traceroute: {"UUID":...} {"type":"cycle-start"...} {"type":"tracelb"...} {"type":"cycle-stop"...}.
BQTracelbLine contains the actual scamper trace details.
ClientInfo details various information about the client.
HopAnnotation1Row describes a single BQ row of HopAnnotation1 data.
No description provided by the author
No description provided by the author
No description provided by the author
NDT5ResultRow defines the BQ schema for the data.NDT5Result produced by the ndt-server for NDT client measurements.
NDT5ResultRowStandardColumns defines the BQ schema for the data.NDT5Result produced by the ndt-server for NDT client measurements.
NDT5Summary contains fields summarizing or derived from the raw data.
NDT7ResultRow defines the BQ schema using 'Standard Columns' conventions for the data.NDT7Result produced by the ndt-server for NDT7 client measurements.
NDT7Summary contains fields summarizing or derived from the raw data.
NDTWeb100 is a mirror struct of the BQ schema.
ParseInfo provides details about the parsed row.
ParseInfoV0 provides details about the parsing of this row.
PCAPRow describes a single BQ row of pcap (packet capture) data.
No description provided by the author
Sample is an individual measurement taken by DISCO.
Scamper1Row defines the BQ schema using 'Standard Columns' conventions for the scamper1 datatype produced by traceroute-caller.
No description provided by the author
ServerInfo details various information about the server.
No description provided by the author
SwitchStats represents a row of data taken from the raw DISCO export file.
TCPRow describes a single BQ row of TCPInfo data.
No description provided by the author
No description provided by the author
No description provided by the author

# Type aliases

Web100ValueMap implements the web100.Saver interface for recording web100 values.