Categorygithub.com/ONSdigital/dp-search-reindex-batch
modulepackage
1.2.0
Repository: https://github.com/onsdigital/dp-search-reindex-batch.git
Documentation: pkg.go.dev

# README

dp-search-reindex-batch

Batch nomad job for reindexing search

Getting started

Dependencies

  • No further dependencies other than those defined in go.mod

Configuration

Environment variableDefaultDescription
AWS_REGION"eu-west-2"AWS region
AWS_SEC_SKIP_VERIFYfalseWhether to skip TLS verification for AWS requests
DATASET_API_URL"http://localhost:22000"URL of the Dataset API
DATASET_PAGINATION_LIMIT500Number of datasets to fetch per page of requests to Dataset API
ENABLE_TOPIC_TAGGINGfalseWhether to enable topic auto-tagging
ELASTIC_SEARCH_URL"http://localhost:11200"URL of elastic search server (or AWS Opensearch)
MAX_DOCUMENT_EXTRACTIONS100Max number of concurrent Document Extractions (ie. Zebedee connections)
MAX_DOCUMENT_TRANSFORMS20Max number of concurrent Document Transformation workers
MAX_DATASET_EXTRACTIONS20Max number of concurrent Dataset Extractions (ie. Dataset API connections)
MAX_DATASET_TRANSFORMS10Max number of concurrent Dataset Transformation workers
SERVICE_AUTH_TOKEN""Zebedee Service Auth Token for API requests
SIGN_ELASTICSEARCH_REQUESTSfalseWhether to sign elasticsearch requests (true for AWS)
TOPIC_API_URL"http://localhost:25300"URL of the Topic API
TRACKER_INTERVAL5sInterval for progress tracker summary logging
ZEBEDEE_URL"http://localhost:8082"URL of publishing zebedee

Local Prerequisites

  • Requires ElasticSearch 7.10 running on port 11200
  • Requires Zebedee running on port 8082 (and this has a dependency on vault)
  • Requires the Dataset API running on port 22000
  • Requires the Topic API running on port 25300 if tagging by topic is desired

NB. The Dataset API requires a mongo database named 'datasets', which must contain the following collections:

  • contacts
  • datasets
  • dimension.options
  • editions
  • instances
  • instances_locks

The Dataset API also requires this environment variable to be set to true: DISABLE_GRAPH_DB_DEPENDENCY

Please make sure your elasticsearch server is running locally on localhost:11200 and version of the server is 7.10, which is the current supported version. You may use dp-compose/v2/stacks/search stack for this.

Please ensure to set the 'ENABLE_TOPIC_TAGGING' flag to true, if the topic tagging feature is required.

If you want to run the reindex script locally but loading data from an environment (e.g. sandbox), you may run dp ssh with port forwarding for dataset-api and zebedee (please check the services IPs and ports in https://consul.dp.aws.onsdigital.uk/ui/eu/services) For example:

dp ssh sandbox publishing 2 -p 22000:10.30.138.234:26020
dp ssh sandbox publishing 1 -p 8082:10.30.138.93:25108

If you do this the service auth token in the configuration will need to be a valid token accepted in the environment you are using.

Contributing

See CONTRIBUTING for details.

License

Copyright © 2023, Office for National Statistics (https://www.ons.gov.uk)

Released under MIT license, see LICENSE for details.

# Packages

No description provided by the author
No description provided by the author

# Variables

BuildTime represents the time in which the service was built.
GitCommit represents the commit (SHA-1) hash of the service that is running.
Version represents the version of the service that is running.