Categorygithub.com/ONSdigital/dp-files-api
modulepackage
1.4.0
Repository: https://github.com/onsdigital/dp-files-api.git
Documentation: pkg.go.dev

# README

DP Files API

Introduction

The Files API is part of the Static Files System. This API is responsible for storing the metadata and state of files.

It is used by the Upload Service to store the metadata of the file being uploaded and keep track of the state of the file during uploaded. During upload the state will be CREATED. Once the full file have been uploaded the upload service should inform this API the upload is complete and the files state will be moved to UPLOADED.

Any service interesting in the metadata or the state of a file can just the GET endpoints. A single files metadata can be retrieved by its path or all files in a collection can be retrieved by ID.

The Download Service uses this API to see whether a file exists and what state it is in before attempting to serve the file to consumers wishing to access the file.

The API has two end points to publish files. Files can be individually published by PATCHING the state to be PUBLISHED. It is also to publish all files in a collection in one call by PATCHING /collection/{collection_id}, this can be used to reduce the number of API calls required to publish a large collection. Currently, most calls to a publish file will come from the Zebedee Publisher

When a file is published this API sends a message via Kafka to the Static File Publisher that permanently moves the file and inform this API that the file is now moved via an HTTP call.

REST API

The api is fully documented in Swagger Docs

Note: When using PATCH calls to modify the file metadata you can either send a collection_id to set the collection_id on a file where it is not already sent or change the state of a file.

Metadata

FieldNotes
pathThe identifier of a file that is stored. Globally unique, and forms part of the bucket/object name when stored
is_publishableThis field currently is ignored and has no affect, the file will be published if a publish update is sent!
collection_idOptional during upload, must be set for the file to be published
titleOptional
size_in_bytesThe size of the file
typemimetype of the file, e.g. "text/csv", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
licenceFreetext name of the licence under which the file is made available
licence_urlURL to the license
stateState of the file - CREATED, UPLOADED, PUBLISHED, MOVED
etagCyrptographic hash of the file content

Additional Metadata

Additional timestamp data about the file is stored in the database but not exposed via the API. Those fields are:

Field
created_at
last_modified
upload_completed_at
published_at
moved_at

File States

StateDescription
CREATEDFile upload has started and the metadata has been provide to this API
UPLOADEDFile upload has been completed. The etag for the final file has been provided
PUBLISHEDThe file has been published (it is available to the public, but is not yet permently moved)
MOVEDThe file has been permanently moved and moved to the public bucket for storage. The public files etag has been provided


 Start     ┌──────────────┐ File      ┌──────────────┐ File       ┌───────────────┐ File       ┌───────────────┐
 Upload    │              │ Uploaded  │              │ Published  │               │ Moved      │               │
      ────►│   CREATED    ├──────────►│   UPLOADED   ├───────────►│   PUBLISHED   ├───────────►│     MOVED     │
           │              │           │              │            │               │            │               │
           └──────────────┘           └──────────────┘            └───────────────┘            └───────────────┘
             File is in a               File is ready               File is available           File is available
             unusable                   for review &                for public download         for public download
             state                      approval                    The stored version          directly from S3
             Can resume upoad           Can be pre-viewed           is moved on-demand          where it is stored                 

Getting started

  • Run make debug

Dependencies

  • No further dependencies other than those defined in go.mod

Configuration

Environment variableDefaultDescription
BIND_ADDR:26900The host and port to bind to
GRACEFUL_SHUTDOWN_TIMEOUT5sThe graceful shutdown timeout in seconds (time.Duration format)
HEALTHCHECK_INTERVAL30sTime between self-healthchecks (time.Duration format)
HEALTHCHECK_CRITICAL_TIMEOUT90sTime to wait until an unhealthy dependent propagates its state to make this app unhealthy (time.Duration format)
IS_PUBLISHINGfalseWhether the service is running in the Publishing domain
PERMISSIONS_API_URLhttp://localhost:25400The hostname of the permissions API
IDENTITY_API_URLhttp://localhost:25600The hostname of the identity API
ZEBEDEE_URLhttp://localhost:8082The hostname of the zebedee API
KAFKA_ADDRkafka:9092A (comma delimited) list of kafka brokers (TLS-ready)
KAFKA_VERSION2.6.1The version of (TLS-ready) Kafka being used
KAFKA_MAX_BYTES200000The max message size for kafka producer
KAFKA_SEC_PROTOunsetif set to TLS, kafka connections will use TLS ([ref-1])
KAFKA_SEC_CLIENT_KEYunsetPEM for the client key ([ref-1])
KAFKA_SEC_CLIENT_CERTunsetPEM for the client certificate ([ref-1])
KAFKA_SEC_CA_CERTSunsetCA cert chain for the server cert ([ref-1])
KAFKA_SEC_SKIP_VERIFYfalseignores server certificate issues if true ([ref-1])
STATIC_FILE_PUBLISHED_TOPICstatic-file-published-v2
MONGODB_BIND_ADDRlocalhost:27017Address of MongoDB
MONGODB_DATABASEfilesThe mongodb database to store imports
MONGODB_COLLECTIONSmetadataThe (comma delimited) list of mongodb collections to store imports
MONGODB_USERNAMEunsetThe mongodb username
MONGODB_PASSWORDunsetThe mongodb username
MONGODB_ENABLE_READ_CONCERNfalseSwitch to use (or not) majority read concern
MONGODB_ENABLE_WRITE_CONCERNtrueSwitch to use (or not) majority write concern
MONGODB_CONNECT_TIMEOUT5sThe default timeout when connecting to mongodb
MONGODB_QUERY_TIMEOUT15sThe default timeout for querying mongodb
MONGODB_IS_SSLfalseSwitch to use (or not) TLS when connecting to mongodb
MONGODB_VERIFY_CERT
MONGODB_CERT_CHAIN
MONGODB_REAL_HOSTNAME

API Client

There is an API Client for the File API this is part of dp-api-clients-go package.

The Files Client provides functions that enables:

  • Setting the Collection ID of existing file
  • Publish all Files in a Collection
  • Get the details of a single file

Contributing

See CONTRIBUTING for details.

License

Copyright © 2022, Office for National Statistics (https://www.ons.gov.uk)

Released under MIT license, see LICENSE for details.

# Packages

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
Package docs Code generated by swaggo/swag.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author

# Variables

BuildTime represents the time in which the service was built.
GitCommit represents the commit (SHA-1) hash of the service that is running.
Version represents the version of the service that is running.