= Exporter Script Server // General Doc Settings :toc: left :source-highlighter: highlightjs :icons: font // Github specifics ifdef::env-github[] :toc: preamble :tip-caption: :bulb: :note-caption: :information_source: :important-caption: :heavy_exclamation_mark: :caution-caption: :fire: :warning-caption: :warning: endif::[] Elizabeth Paige Harper [email protected] v1.0.0

// Custom Config :repo-url: https://github.com/VEuPathDB/util-user-dataset-handler-server :site-url: https://veupathdb.github.io/util-user-dataset-handler-server :repo-file-base: {repo-url}/blob/master

image:https://www.travis-ci.org/VEuPathDB/util-user-dataset-handler-server.svg?branch=master["Build Status", link="https://www.travis-ci.org/VEuPathDB/util-user-dataset-handler-server"] image:https://goreportcard.com/badge/github.com/VEuPathDB/util-user-dataset-handler-server["Go Report Card", link="https://goreportcard.com/report/github.com/VEuPathDB/util-user-dataset-handler-server"] image:https://img.shields.io/github/v/release/VEuPathDB/util-user-dataset-handler-server["Latest Release", link="https://github.com/VEuPathDB/util-user-dataset-handler-server/releases/latest"]

Exposes configured exporter / validation scripts over HTTP.

ifdef::env-github[] {site-url}[Rendered Readme] | endif::[] {site-url}/api.html[API Docs]

== Usage

=== In Script Container

==== Standard Use

https://github.com/VEuPathDB/dataset-handler-biom[Working example project]

==== Custom Use

To create a custom setup not based on the demo container, this server can be configured by performing the following steps.

CAUTION: Currently, this server app only supports linux & mac environments.

. Download the latest release from the {repo-url}/releases/latest[releases page] and unpack the archive in your project directory. . Run the command `./service gen-config` to generate a server configuration template file. This file will be named `config.tpl.yml`. . Edit the configuration file with your desired service name and command configurations. + NOTE: Check the {file-config-readme}[config file readme] for more information about the configuration file. . Once you have edited the configuration file, rename it and place it in the desired relative to the service binary. . Validate the configuration by running the command `./service check-config --config=path/to/your/config.yml`. This command will validate the configuration and report any errors. + .Bad Config Example [source, bash-session]

$ ./service --config=config.yml check-config

== Script Expectations

This server is language agnostic when it comes to calling scripts. Anything may be used as long as it can be executed via a cli call.

=== Output

The script expectations are based on the Galaxy tooling. This means scripts are expected to output a .tgz file suitable for being directly loaded into iRODS.

The pattern for the tar file output name must match the iRODS trigger's expected format of dataset_u\{user-id}_t\{timestamp-int64}_p\{process-id}.tgz. For example: dataset_u12345_t1647364816_p132.tgz.

This means the tar file must contain a meta.json, dataset.json, and a directory named datafiles containing the actual dataset file(s).

This server will automatically pick up that file and return it to the udis service to be loaded into iRODS.

The reason for this is the HTTP server which unpacks and starts the command does not know what belongs or doesn't belong in the iRODS tar.

=== Error handling

A script can indicate a fatal error by exiting with a non-zero status code. An error message will be generated using the text printed to stderr by the script.

Errors may be plaintext, a json object, or plaintext followed by a json object.

To indicate an error is a user error and not a script failure, a json object error must be returned.

.Valid `stderr` output examples [source, shell script]

A plaintext error

Some error message that will be recorded as a script/server error.

Error output followed by a final fatal error.

Some error text followed by a {"error": "json", "message": "object"}

A structured error

{"error": "user", "message": "some user error"}

When returning both plaintext and a json object, the plaintext will be prepended to the error message in the json output.

For example:

.Raw `stderr` output

Some error messages Logged while executing the script {"error": "fatal", "message": "failed to open input file"}

.Recorded error [source, json]

{ "error": "fatal", "message": "Some error messages\nLogged while executing the script\nfailed to open input file" }

==== Json Object Error Format

[source,json]

{ "error": "user", <1> "message": "some error text" <2> }

<1> The "error" field is used to define the type of error. A value of "user" indicates that the error was a user error and the user import service will record the error as such. Any other value will be recorded as a script error. <2> The "message" field defines the error message to record.

CAUTION: Both fields are required.

== Config File

=== Components

.config.yml [source, yaml, linenums]

service-name: my service <1> command: executable: /some/path/to/your/executable <2> args: <3> - <> <4>

<1> The display name of the running service <2> Path to an executable script or binary to be called by the server to process uploaded datasets. <3> An array of arguments that will be passed to the configured executable. Each array element is effectively a single, quote wrapped cli arg. + WARNING: Space separated entries in the args array will be treated as a single argument. <4> An <<Variables,Injected Variable>>

=== Command Context

The configured command will be executed in an isolated subshell, but will be provided the same environment as the server itself, meaning it is possible to set environment variables for the script simply by setting them on the docker container.

The output of the script's stdout will be piped through the server's logging mechanism and will appear in the container logs.

The output of the script's stderr will be both piped through the server's logging mechanism (like stdout) but will also be captured for parsing and returning to the caller.

=== Variables

==== +<<cwd>>+

The working directory for the job that is running in the current request.

.Example

/workspace/12345

==== +<<date>>+

The current date formatted as YYYY-MM-DD.

.Example

1994-02-13

==== +<<date-time>>+

The current datetime in RFC3339 format.

.Example

2018-10-31T23:37:18.013557+0500

==== +<<ds-description>>+

The user provided description for the current dataset upload.

.Example

My dataset upload containing foo and bar

==== +<<ds-name>>+

The user provided name for the current dataset upload.

.Example

My Dataset 3

==== +<<ds-summary>>+

The user provided summary of the current dataset upload.

.Example

Some summary text for my dataset upload

==== +<<ds-origin>>+

The source/origin of the user dataset should be either galaxy or direct-upload.

.Example

direct-upload

==== +<<ds-user-id>>+

WDK user ID of the user that uploaded the dataset.

.Example

123456

==== +<<input-files>>+

A space separated list of the files that were unpacked from the uploaded zip or tar file sorted by name ascending.

===== Examples

[#upload-tgz] .Upload Contents

dataset.tgz ├─ foo.txt ├─ bar.xml ├─ fizz.json └─ buzz/ └─ fazz.yml

.+<>+

bar.xml buzz fizz.json foo.txt

===== +<<input-files[n]>>+

Array style access of the input file list allowing retrieval of a single file name from the input file list.

====== Examples

These use <<#upload-tgz, this example input>>.

.+<<input-files[0]>>+

bar.xml

.+<<input-files[2]>>+

fizz.json

==== +<<time>>+

The current time formatted as HH:MM:SS.

.Example

03:47:58

==== +<<timestamp>>+

The current unix timestamp in seconds.

.Example

783647299

==== <<projects>>

A comma-separated list of project identifiers.

.Example

VectorBase,PlasmoDB

==== +<<handler-params.p1>>+

Access of handler-specific parameter values from the request body of the import request.

===== Examples .Request Body [source, json]

{ "datasetName": "foo", "projects": [ "PlasmoDB" ], "datasetType": "gene-list", "origin": "direct-upload", "formatParams": { "genome": "test-genome-1", "otherParam": "other-param-value" } }

# README

$ ./service --config=config.yml check-config

.Valid stderr output examples [source, shell script]

A plaintext error

Error output followed by a final fatal error.

A structured error

{"error": "user", "message": "some user error"}

.Raw stderr output

Some error messages Logged while executing the script {"error": "fatal", "message": "failed to open input file"}

.Recorded error [source, json]

{ "error": "fatal", "message": "Some error messages\nLogged while executing the script\nfailed to open input file" }

[source,json]

{ "error": "user", <1> "message": "some error text" <2> }

.config.yml [source, yaml, linenums]

service-name: my service <1> command: executable: /some/path/to/your/executable <2> args: <3> - <> <4>

.Example

/workspace/12345

.Example

1994-02-13

.Example

2018-10-31T23:37:18.013557+0500

.Example

My dataset upload containing foo and bar

.Example

My Dataset 3

.Example

Some summary text for my dataset upload

.Example

direct-upload

.Example

123456

[#upload-tgz] .Upload Contents

dataset.tgz ├─ foo.txt ├─ bar.xml ├─ fizz.json └─ buzz/ └─ fazz.yml

.+<>+

bar.xml buzz fizz.json foo.txt

.+<<input-files[0]>>+

bar.xml

.+<<input-files[2]>>+

fizz.json

.Example

03:47:58

.Example

783647299

.Example

VectorBase,PlasmoDB

===== Examples .Request Body [source, json]

{ "datasetName": "foo", "projects": [ "PlasmoDB" ], "datasetType": "gene-list", "origin": "direct-upload", "formatParams": { "genome": "test-genome-1", "otherParam": "other-param-value" } }

.+<<handler_params.genome>>+

test-genome-1

.+<<handler_params.otherParam>>+

other-param-value

# Packages

.Valid `stderr` output examples [source, shell script]

.Raw `stderr` output