Categorygithub.com/groovenauts/blocks-gcs-proxy
repositorypackage
0.14.0-alpha1
Repository: https://github.com/groovenauts/blocks-gcs-proxy.git
Documentation: pkg.go.dev

# README

blocks-gcs-proxy

Build Status

blocks-gcs-proxy is a proxy for MAGELLAN BLOCKS concurrent batch board.

Features

Installation

Download the file from https://github.com/groovenauts/blocks-gcs-proxy/releases and put it somewhere on PATH.

Usage

Create config.json like this:

{
  "job": {
    "subscription": "projects/proj-dummy-999/subscriptions/pipeline01-job-subscription"
  },
  "progress": {
    "topic": "projects/proj-dummy-999/topics/pipeline01-progress-topic"
  }
}

Run blocks-gcs-proxy with command

blocks-gcs-proxy COMMAND ARGS...

blocks-gcs-proxy calls COMMAND with ARGS for each message from Pubsub subscription specified by job.subscription in config.json.

config.json

KeyTypeRequiredDefaultDescription
jobmapFalse
job.error_responsestringFalseackResponse type on error. It must be one of {ack, nack, none}
job.interval_on_errorintFalse0The interval time in second to return response on error
job.pull_intervalintFalse10The interval time in second to pull when it gets no job message.
job.subscriptionstringFalseprojects/{{ .GCP_PROJECT }}/subscriptions/{{ .PIPELINE }}-job-subscriptionThe subscription name to pull job messages
job.sustainermapFalse
job.sustainer.delayintFalseSee SustainerThe new deadline in second to extend deadline to ack
job.sustainer.disabledboolFalseSee SustainerDisable sustainer if it's true
job.sustainer.intervalintFalseSee SustainerThe interval in second to send the message which extends deadline to ack
job_checkmapFalse
job_check.methodstringTrue"none"Method to check job before running. You can set one of none, buntdb or gcslock
job_check.databasestringFalseThe database name to store job execution data. The usage depends on method
job_check.bucketstringFalseThe bucket name to store job execution data. The usage depends on method
job_check.timeoutstringFalseThe timeout expression like '1h10m10s'. The usage depends on method
progressmapFalse
progress.attributesmap[string]stringFalse{}Static attributes of progress notification message
progress.levelstringFalseinfoLog level to publish job progress. You can set one of debug, info, warn, error, fatal and panic.
progress.topicstringFalseprojects/{{ .GCP_PROJECT }}/topics/{{ .PIPELINE }}-progress-topicThe topic name to publish job progress messages
logmapFalse
log.command_severitystringFalseinfoThe Log severity of command outputs. You can set one of debug, info, warn, error, fatal and panic.
log.levelstringFalseinfoLog level of processing of blocks-gcs-proxy. You can set one of debug, info, warn, error, fatal and panic.
log.stackdrivermapFalse
log.stackdriver.error_reporting_servicestringFalseThe service name of ServiceContext
log.stackdriver.labelsmap[string]stringTrueThe labels of Monitored resource
log.stackdriver.log_namestringTrueThe resource name of the log that will receive the log entries
log.stackdriver.project_idstringTrueGCP Project ID
log.stackdriver.typestringTrueThe type of Monitored resource
commandmapFalse
command.dryrunboolFalsefalseDon't run the command if this is true.
command.optionsmap[key][]stringFalseDefine if you have to run one of multiple command. See Multiple command options for more detail.
downloadmapFalse
download.allow_irregular_urlboolFalseFalseAllow not strict URL to download
download.workermapFalse
download.worker.max_triesintFalse0The number of tries to download.
download.worker.workersintFalse1The number of thread to download.
uploadmapFalse
upload.content_type_by_extboolFalseSet content type by file extension when uploading to GCS
upload.workermapFalse
upload.worker.max_triesintFalse0The number of tries to upload.
upload.worker.workersintFalse1The number of thread to upload.

Multiple command options

If you have commands data in your config.json like the following:

{
  "commands": {
    "options": {
      "key1": ["cmd1", "%{download_files}"],
      "key2": ["cmd2", "%{download_files.bar}", "%{uploads_dir}", "%{download_files.baz}"]
    }
  }
}

And when you run the command by

$ blocks-gcs-proxy %{attrs.foo}

you can choose which command is executed by message attribute named foo at runtime.

message attributescommand magellan-gcs-proxy calls
{"foo": "key1"}cmd1 %{download_files}
{"foo": "key2"}cmd2 %{download_files.bar} %{uploads_dir} %{download_files.baz}

If the attribute value is not defined in commands keys, the message is ignored with error message.

Sustainer

When your command takes longer time than AckDeadline of the pipeline job subscription, Sustainer sends requests to the subscription to expand the deadline. If you don't set job.sustainer in your config.json, blocks-gcs-proxy sets them from the subscription's AckDeadline.

KeyDefault
job.sustainer.delaySubscription's AckDeadline
job.sustainer.intervalSubscription's AckDeadline * 0.8

blocks-gcs-proxy check

Check the config.json is valid. You can give other file with --config or -c option.

$ ./blocks-gcs-proxy check -c config2.json
Error to load config.json.bak cause of invalid character '}' looking for beginning of object key string

blocks-gcs-proxy download

NAME:
   blocks-gcs-proxy download - Download the files from GCS to downloads directory

USAGE:
   blocks-gcs-proxy download [command options] [arguments...]

OPTIONS:
   --downloads_dir value, -d value  Path to the directory which has bucket_name/path/to/file
   --downloaders value, -n value    Number of downloaders (default: 6)
$ ./blocks-gcs-proxy download --help

Example

$ ./blocks-gcs-proxy download -d tmp/downloads -n 5 gs://bucket1/path/to/file1  gs://bucket1/path/to/file2  gs://bucket1/path/to/file3

blocks-gcs-proxy upload

$ ./blocks-gcs-proxy upload --help
NAME:
   blocks-gcs-proxy upload - Upload the files under uploads directory

USAGE:
   blocks-gcs-proxy upload [command options] [arguments...]

OPTIONS:
   --uploads_dir value, -d value  Path to the directory which has bucket_name/path/to/file
   --uploaders value, -n value    Number of uploaders (default: 6)

Example

$ ./blocks-gcs-proxy upload -d tmp/uploads -n 5