Categorygithub.com/ONSdigital/dp-search-data-extractor

# README

dp-search-data-extractor

Service to retrieve published data to be used to update a search index This service calls /publisheddata endpoint on zebedee and metadata endpoint on dataset API.

This service listens to the content-updated kafka topic for events of type contentUpdatedEvent e.g. see schemas package.

This service takes the uri, from the consumed event, and either calls ...

  1. ... /publisheddata endpoint on zebedee. It passes in the URI as a path parameter e.g. http://localhost:8082/publisheddata?uri=businessindustryandtrade
  2. ... /datasets//editions//versions//metadata endpoint on dataset API, e.g. http://localhost:22000/datasets/CPIH01/editions/timeseries/versions/1/metadata

See search service architecture docs here

Getting started

  • Run make debug
  • Run make help to see full list of make targets

The service runs in the background consuming messages from Kafka. An example event can be created using the helper script, make produce.

Dependencies

  • golang 1.20.x
  • Running instance of zebedee
  • Requires running…
  • No further dependencies other than those defined in go.mod

Configuration

Environment variableDefaultDescription
BIND_ADDRlocalhost:25800The host and port to bind to
DATASET_API_URLhttp://localhost:22000The URL for the DatasetAPI
GRACEFUL_SHUTDOWN_TIMEOUT5sThe graceful shutdown timeout in seconds (time.Duration format)
HEALTHCHECK_INTERVAL30sTime between self-healthchecks (time.Duration format)
HEALTHCHECK_CRITICAL_TIMEOUT90sTime to wait until an unhealthy dependent propagates its state to make this app unhealthy (time.Duration format)
KAFKA_ADDR"localhost:9092"The address of Kafka (accepts list)
KAFKA_OFFSET_OLDESTtrueStart processing Kafka messages in order from the oldest in the queue
KAFKA_VERSION1.0.2The version of Kafka
KAFKA_NUM_WORKERS1The maximum number of parallel kafka consumers
KAFKA_SEC_PROTOunset (only TLS)if set to TLS, kafka connections will use TLS
KAFKA_SEC_CLIENT_KEYunsetPEM [2] for the client key (optional, used for client auth) [1]
KAFKA_SEC_CLIENT_CERTunsetPEM [2] for the client certificate (optional, used for client auth) [1]
KAFKA_SEC_CA_CERTSunsetPEM [2] of CA cert chain if using private CA for the server cert [1]
KAFKA_SEC_SKIP_VERIFYfalseignore server certificate issues if set to true [1]
KAFKA_CONTENT_UPDATED_GROUPdp-search-data-extractorThe consumer group this application to consume content-updated messages
KAFKA_CONTENT_UPDATED_TOPICcontent-updatedThe name of the topic to consume messages from
KAFKA_PRODUCER_TOPICsearch-data-importThe name of the topic to produce messages to
KEYWORDS_LIMITS-1The keywords allowed, default no limit
SERVICE_AUTH_TOKENunsetThe service auth token for the dp-search-data-extractor
STOP_CONSUMING_ON_UNHEALTHYtrueApplication stops consuming kafka messages if application is in unhealthy state
TOPIC_TAGGING_ENABLEDfalseEnable topics tagging using the topic cache
TOPIC_CACHE_UPDATE_INTERVAL30mThe time interval to update topics cache (time.Duration format)
TOPIC_API_URLhttp://localhost:25300The URL for the Topic API
ZEBEDEE_URLhttp://localhost:8082The URL for the Zebedee

Notes:

  1. For more info, see the kafka TLS examples documentation

Healthcheck

The /health endpoint returns the current status of the service. Dependent services are health checked on an interval defined by the HEALTHCHECK_INTERVAL environment variable.

On a development machine a request to the health check endpoint can be made by:

curl localhost:25800/health

Contributing

See CONTRIBUTING for details.

License

Copyright © 2024, Office for National Statistics (https://www.ons.gov.uk)

Released under MIT license, see LICENSE for details.

# Packages

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author

# Variables

BuildTime represents the time in which the service was built.
GitCommit represents the commit (SHA-1) hash of the service that is running.
Version represents the version of the service that is running.