Categorygithub.com/ONSdigital/dp-hierarchy-builder

# README

dp-hierarchy-builder

The hierarchy builder is a service that forms part of the dataset import process. It requires a 'full' hierarchy to be available for the dataset you are importing

Getting started

Run the dp-hierarchy-builder service:

make debug

Hierarchy import scripts

The import scripts rely on the CLI tool for the database you're targeting:

  • cypher-shell for Neo4j / Cypher import
  • gremlin.sh for Neptune / Gremlin imports

You may also need an SSH tunnel to the database if it's not running locally.

Example import for the CPIH full hierarchy: (replace the filename for other hierarchy import files)

Cypher: cypher-shell < import-scripts/cypher/cpih1dim1aggid.cypher

Gremlin: ./gremlin-import.sh import-scripts/gremlin/cpih1dim1aggid.grm

For Cypher imports, you can use additional flags if running against an environment other than localhost: cypher-shell -u USER -p PASSWORD -a bolt://localhost:7687 < .....

Kafka scripts

Scripts for updating and debugging Kafka can be found here(dp-data-tools)

Configuration

Environment variableDefaultDescription
BIND_ADDR:22700The host and port to bind to
KAFKA_ADDRlocalhost:9092A list of Kafka host addresses
KAFKA_VERSION1.0.2The kafka version that this service expects to connect to
KAFKA_OFFSET_OLDESTtruesets kafka offset to oldest if true
KAFKA_SEC_PROTOunsetif set to TLS, kafka connections will use TLS [1]
KAFKA_SEC_CLIENT_KEYunsetPEM for the client key [1]
KAFKA_SEC_CLIENT_CERTunsetPEM for the client certificate [1]
KAFKA_SEC_CA_CERTSunsetCA cert chain for the server cert [1]
KAFKA_SEC_SKIP_VERIFYfalseignores server certificate issues if true [1]
CONSUMER_GROUPdp-hierarchy-builderThe name of the Kafka consumer group
CONSUMER_TOPICobservations-importedThe name of the topic to consumes messages from
PRODUCER_TOPIChierarchy-builtThe name of the topic to produces messages to
ERROR_PRODUCER_TOPICimport-errorThe name of the topic to send error messages to
GRACEFUL_SHUTDOWN_TIMEOUTtime.Second * 10Time time to wait when gracefully shutting down before closing
HEALTHCHECK_INTERVAL30sThe time between doing health checks
HEALTHCHECK_CRITICAL_TIMEOUT90sThe time taken for the health changes from warning state to critical due to subsystem check failures

Plus the graph database vars from dp-graph - namely GRAPH_DRIVER_TYPE and GRAPH_ADDR

Notes:

  1. For more info, see the kafka TLS examples documentation

Graph / Neptune Configuration

Environment variableDefaultDescription
GRAPH_DRIVER_TYPE""string identifier for the implementation to be used (e.g. 'neptune' or 'mock')
GRAPH_ADDR""address of the database matching the chosen driver type (web socket)
NEPTUNE_TLS_SKIP_VERIFYfalseflag to skip TLS certificate verification, should only be true when run locally

:warning: to connect to a remote Neptune environment on MacOSX using Go 1.18 or higher you must set NEPTUNE_TLS_SKIP_VERIFY to true. See our Neptune guide for more details.

Healthcheck

The /healthcheck endpoint returns the current status of the service. Dependent services are health checked on an interval defined by the HEALTHCHECK_INTERVAL environment variable.

On a development machine a request to the health check endpoint can be made by:

curl localhost:22700/healthcheck

Command line tools

There are a number of utility applications for manual tasks (found under the cmd directory):

  • v4-transformer - take a V4 file and create a full hierarchy input file / cypher script
  • geography-transformer - take a geography input CSV file and output a full hierarchy input file / cypher script
  • hierarchy-transformer - take a hierarchy input CSV file and generate cypher script
  • builder - builds an instance hierarchy from a full hierarchy
  • producer - produces a Kafka message for the dp-hierarchy-builder process to consume

Manually building instance hierarchies

  • Manually create instance hierarchy for CPIH - note you will have to replace the value for 'instance-id'

go run cmd/builder/main.go --instance-id 27a4019f-6491-4876-bbdd-1439a40e5bb9 --dimension-name aggregate --code-list-id e44de4c4-d39e-4e2f-942b-3ca10584d078

  • Manually create instance hierarchy for mid-year-pop-est - note you will have to replace the value for 'instance-id'

go run cmd/builder/main.go --instance-id 34b8c139-a1fe-45b1-95e2-e77df3682256 --dimension-name geography --code-list-id mid-year-pop-geography

If running one of the above commands against an environment, you can specify the neo4j URL with the flag (replacing USER, PASSSWORD, host, and port as required):

--neo-url="bolt://USER:PASSWORD@localhost:7687"

transform a hierarchy input file to a cypher script (set FILE as required input file)

make FILE=./cmd/hierarchy-transformer/hierarchy.csv generate-full

output is written to ./cmd/hierarchy-transformer/output

transform a geography input file to a hierarchy input file / cypher script

make FILE=./cmd/geography-transformer/WD16_LAD16_CTY16_OTH_UK_LU.csv generate-full-from-geography

output is written to ``./cmd/geography-transformer/output`

transform a code,label,parent format csv to a hierarchy input file

codelistid=cpih1dim1aggid
go run cmd/code-label-parent-transformer/main.go --file import-scripts/code-label-parent-csv/$codelistid.csv --code-list-id $codelistid --output import-scripts/$codelistid.csv`

produce a Kafka message for the dp-hierarchy-builder process to consume

go run cmd/producer/main.go --instance-id '58004716-a2d4-4dd1-a6c3-6accab30ad2a' --code-list-id 'cpih1dim1aggid' --dimension-name 'aggregate'

Contributing

See CONTRIBUTING for details.

License

Copyright © 2016-2021, Office for National Statistics (https://www.ons.gov.uk)

Released under MIT license, see LICENSE for details.

# Packages

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author