Categorygithub.com/ONSdigital/dp-dimension-extractor
modulepackage
1.20.0
Repository: https://github.com/onsdigital/dp-dimension-extractor.git
Documentation: pkg.go.dev

# README

dp-dimension-extractor

Handles inserting of dimensions into database after input file becomes available; and creates an event by sending a message to a dimension-extracted kafka topic so further processing of the input file can take place.

  1. Consumes from the INPUT_FILE_AVAILABLE_TOPIC
  2. Retrieves file (csv) from aws S3 bucket
  3. Put requests for each unique dimension onto database via the dataset API
  4. Produces a message to the DIMENSIONS_EXTRACTED_TOPIC

Requirements

In order to run the service locally you will need the following:

To run vault:

  • Run brew install vault
  • Run vault server -dev

Getting started

  • Clone the repo go get github.com/ONSdigital/dp-dimension-extractor
  • Run kafka and zookeeper
  • Run local S3 store
  • Run the dataset API, see documentation here
  • Run api auth stub, see documentation here
  • Run the application with make debug

Kafka scripts

Scripts for updating and debugging Kafka can be found here(dp-data-tools)

Configuration

Environment variableDefaultDescription
AWS_REGIONeu-west-1The AWS region to use
BIND_ADDR:21400The host and port to bind to
DATASET_API_URLhttp://localhost:22000The dataset API url
DATASET_API_AUTH_TOKENFD0108EA-825D-411C-9B1D-41EF7727F465Authentication token for access to dataset API
DIMENSIONS_EXTRACTED_TOPICdimensions-extractedThe kafka topic to write messages to
DIMENSION_EXTRACTOR_URLhttp://localhost:21400The dimension extractor url
ENCRYPTION_DISABLEDtrueA boolean flag to identify if encryption of files is disabled or not
EVENT_REPORTER_TOPICreport-eventsThe kafka topic to send errors to
GRACEFUL_SHUTDOWN_TIMEOUT5sThe graceful shutdown timeout in seconds
INPUT_FILE_AVAILABLE_GROUPinput-file-availableThe kafka consumer group to consume messages from
INPUT_FILE_AVAILABLE_TOPICinput-file-availableThe kafka topic to consume messages from
KAFKA_ADDRlocalhost:9092The kafka broker addresses (can be comma separated)
KAFKA_MAX_BYTES2000000The maximum permitted size of a message. Should be set equal to or smaller than the broker's message.max.bytes
KAFKA_VERSION"1.0.2"The kafka version that this service expects to connect to
KAFKA_SEC_PROTOunsetif set to TLS, kafka connections will use TLS [1]
KAFKA_SEC_CLIENT_KEYunsetPEM for the client key [1]
KAFKA_SEC_CLIENT_CERTunsetPEM for the client certificate [1]
KAFKA_SEC_CA_CERTSunsetCA cert chain for the server cert [1]
KAFKA_SEC_SKIP_VERIFYfalseignores server certificate issues if true [1]
LOCALSTACK_HOST""Host for localstack for S3 usage - only for local use
REQUEST_MAX_RETRIES3The maximum number of attempts for a single http request due to external service failure"
VAULT_ADDRhttp://localhost:8200The vault address
VAULT_TOKEN-Vault token required for the client to talk to vault. (Use make debug to create a vault token)
VAULT_PATHsecret/shared/pskThe path where the psks will be stored in for vault
SERVICE_AUTH_TOKENE45F9BFC-3854-46AE-8187-11326A4E00F4The service authorization token
ZEBEDEE_URLhttp://localhost:8082The host name for Zebedee
AWS_ACCESS_KEY_ID-The AWS access key credential for the dimension extractor
AWS_SECRET_ACCESS_KEY-The AWS secret key credential for the dimension extractor
HEALTHCHECK_INTERVAL30sThe period of time between health checks
HEALTHCHECK_CRITICAL_TIMEOUT90sThe period of time after which failing checks will result in critical global check

Notes:

  1. For more info, see the kafka TLS examples documentation

Contributing

See CONTRIBUTING for details.

License

Copyright © 2016-2021, Office for National Statistics (https://www.ons.gov.uk)

Released under MIT license, see LICENSE for details.

# Packages

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author

# Variables

BuildTime represents the time in which the service was built.
GitCommit represents the commit (SHA-1) hash of the service that is running.
Version represents the version of the service that is running.