Categorygithub.com/ONSdigital/dp-dataset-api
modulepackage
1.70.0
Repository: https://github.com/onsdigital/dp-dataset-api.git
Documentation: pkg.go.dev

# README

dp-dataset-api

An ONS API used to navigate datasets, editions and versions - which are published.

Installation

Database

  • Run brew install mongo

  • Run brew services start mongodb

  • Run brew install neo4j

  • Configure neo4j, edit /usr/local/Cellar/neo4j/3.2.0/libexec/conf/neo4j.conf

  • Set dbms.security.auth_enabled=false

  • Run brew services restart neo4j

Getting started

To run make lint-api-spec you require Node v20.x and to install redocly/cli:

   npm install -g @redocly/cli

State changes

Normal sequential order of states:

  1. created (only on instance)
  2. submitted (only on instance)
  3. completed (only on instance)
  4. edition-confirmed (only on instance - this will create an edition and version, in other words the instance will now be accessible by version endpoints). Also the dataset next sub-document will also get updated here and so will the edition (authorised users will see a different latest version link versus unauthorised users)
  5. associated (only on version) - dataset next sub-document will be updated again and so will the edition
  6. published (only on version) - both edition and dataset are updated - must not be changed

There is the possibility to rollback from associated to edition-confirmed where a PST user has attached the version to the wrong collection and so not only does the collection_id need to be updated with the new one (or removed altogether) but the state will need to revert back to edition-confirmed.

Lastly, skipping a state: it is possibly to jump from edition-confirmed to published as long as all the mandatory fields are there. There also might be a scenario whereby the state can change from created to completed, missing out the step to submitted due to race conditions, this is not expected to happen, the path to get to completed is longer than the submitted one.

A state machine has now been implemented on the PUT /versions endpoint. This has been implemented as part of the refactoring to bring the application in line with some of the clean architecture principles. The state machine is instantiated on start up of the application as a singleton object which ensures there is only one instance used in the application for consistency. The state machine code is held in the application package, where all validation on the data sent in the request is performed.

The state machine requires a list of allowed transitions for each state and the dataset type to run. The states of created, submitted, and completed have not been included in the state machine as these are not relevant to the PUT /versions endpoint. If the state sent as part of the PUT request is not in the allowed transitions for the current state for the dataset type then a 400 status code response is returned.

Healthcheck

The endpoint /health checks the connection to the database and returns one of:

  • success (200, JSON "status": "OK")
  • failure (500, JSON "status": "error").

The /health endpoint replaces /healthcheck, which now returns a 404 Not Found response.

Kafka scripts

Scripts for updating and debugging Kafka can be found here(dp-data-tools)

Configuration

Environment variableDefaultDescription
BIND_ADDR:22000The host and port to bind to
MONGODB_BIND_ADDRlocalhost:27017The MongoDB bind address
MONGODB_USERNAMEThe MongoDB Username
MONGODB_PASSWORDThe MongoDB Password
MONGODB_DATABASEdatasetsThe MongoDB database
MONGODB_COLLECTIONSDatasetsCollection:datasets, ContactsCollection:contacts, EditionsCollection:editions, InstanceCollection:instances, DimensionOptionsCollection:dimension.options, InstanceLockCollection:instances_locks, VersionsCollection:versionsThe MongoDB collections
MONGODB_REPLICA_SETThe name of the MongoDB replica set
MONGODB_ENABLE_READ_CONCERNfalseSwitch to use (or not) majority read concern
MONGODB_ENABLE_WRITE_CONCERNtrueSwitch to use (or not) majority write concern
MONGODB_CONNECT_TIMEOUT5sThe timeout when connecting to MongoDB (time.Duration format)
MONGODB_QUERY_TIMEOUT15sThe timeout for querying MongoDB (time.Duration format)
MONGODB_IS_SSLfalseSwitch to use (or not) TLS when connecting to MongoDB
SECRET_KEYFD0108EA-825D-411C-9B1D-41EF7727F465A secret key used for authentication
CODE_LIST_API_URLhttp://localhost:22400The host name for the CodeList API
DATASET_API_URLhttp://localhost:22000The host name for the Dataset API
DOWNLOAD_SERVICE_URLhttp://localhost:23600The host name for the Download Service
IMPORT_API_URLhttp://localhost:21800The host name for the Import API
GRACEFUL_SHUTDOWN_TIMEOUT5sThe graceful shutdown timeout in seconds
WEBSITE_URLhttp://localhost:20000The host name for the website
KAFKA_ADDRlocalhost:9092The address of (TLS-ready) Kafka brokers (comma-separated values)
KAFKA_CONSUMER_MIN_BROKERS_HEALTHY2The minimum number of consumer brokers needed
KAFKA_PRODUCER_MIN_BROKERS_HEALTHY2The minimum number of producer brokers needed
KAFKA_VERSION1.0.2The version of (TLS-ready) Kafka
KAFKA_SEC_PROTOunset (only TLS)If set to TLS, Kafka connections will use TLS
KAFKA_SEC_CLIENT_KEYunsetPEM [2] for the client key (optional, used for client auth) [1]
KAFKA_SEC_CLIENT_CERTunsetPEM [2] for the client certificate (optional, used for client auth) [1]
KAFKA_SEC_CA_CERTSunsetPEM [2] of CA cert chain if using private CA for the server cert [1]
KAFKA_SEC_SKIP_VERIFYfalseIgnore server certificate issues if set to true [1]
GENERATE_DOWNLOADS_TOPICfilter-job-submittedThe topic to send generate full dataset version downloads to
HEALTHCHECK_INTERVAL30sThe time between calling healthcheck endpoints for check subsystems
HEALTHCHECK_CRITICAL_TIMEOUT90sThe time taken for the health changes from warning state to critical due to subsystem check failures
ENABLE_PRIVATE_ENDPOINTSfalseEnable private endpoints for the API
DISABLE_GRAPH_DB_DEPENDENCYfalseDisables connection and health check for graph db
DOWNLOAD_SERVICE_SECRET_KEYQB0108EZ-825D-412C-9B1D-41EF7747F462A key specific for the download service to access public/private links
ZEBEDEE_URLhttp://localhost:8082The host name for Zebedee
ENABLE_PERMISSIONS_AUTHfalseEnable/disable user/service permissions checking for private endpoints
ENABLE_URL_REWRITINGfalseEnable/disable URL rewriting
DEFAULT_MAXIMUM_LIMIT1000Default maximum limit for pagination
DEFAULT_LIMIT20Default limit for pagination
DEFAULT_OFFSET0Default offset for pagination
OTEL_BATCH_TIMEOUT5sInterval between pushes to OT Collector
OTEL_EXPORTER_OTLP_ENDPOINThttp://localhost:4317URL for OpenTelemetry endpoint
OTEL_SERVICE_NAMEdp-dataset-apiService name to report to telemetry tools
OTEL_ENABLEDfalseFeature flag to enable OpenTelemetry

Notes:

  1. Ignored unless using TLS (i.e. KAFKA_SEC_PROTO has a value enabling TLS)

  2. PEM values are identified as those starting with -----BEGIN and can use \n (sic) instead of newlines (they will be converted to newlines before use). Any other value will be treated as a path to the given PEM file.

Graph / Neptune Configuration

Environment variableDefaultDescription
GRAPH_DRIVER_TYPE""string identifier for the implementation to be used (e.g. 'neptune' or 'mock')
GRAPH_ADDR""address of the database matching the chosen driver type (web socket)
NEPTUNE_TLS_SKIP_VERIFYfalseflag to skip TLS certificate verification, should only be true when run locally

:warning: to connect to a remote Neptune environment on MacOSX using Go 1.18 or higher you must set NEPTUNE_TLS_SKIP_VERIFY to true. See our Neptune guide for more details.

Contributing

See CONTRIBUTING for details.

License

Copyright © 2016-2022, Office for National Statistics

Released under MIT license, see LICENSE for details

# Packages

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author

# Variables

BuildTime represents the time in which the service was built.
GitCommit represents the commit (SHA-1) hash of the service that is running.
Version represents the version of the service that is running.