Categorygithub.com/quortex/influxdb-athena-crawler
modulepackage
0.0.0-20241114104519-cb0516414524
Repository: https://github.com/quortex/influxdb-athena-crawler.git
Documentation: pkg.go.dev

# README

influxdb-athena-crawler

An AWS Athena crawler for InfluxDB.

Overview

This project is a utility designed to get AWS Athena results (CSV objects stored in AWS S3), parse them and write InfluxDB points.

Prerequisites

AWS

To be used with AWS and interact with the s3 bucket, an AWS account with the following permissions on s3 is required (note that s3:DeleteObject is only required if clean-objects is set):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:ListBucket"],
      "Resource": "<BUCKET_NAME>"
    },
    {
      "Effect": "Allow",
      "Action": ["s3:ListObjects", "s3:GetObject", "s3:DeleteObject"],
      "Resource": "<BUCKET_NAME>/*"
    }
  ]
}

Installation

Helm (Kubernetes install)

Follow influxdb-athena-crawler documentation for Helm deployment here.

Configuration

Optional args

influxdb-athena-crawler takes as argument the parameters below.

KeyDescriptionDefault
regionThe AWS region.""
bucketThe AWS bucket to watch.""
prefixThe bucket prefix.""
suffixFilename suffix to restrict files processed on the bucket.""
clean-objectsWhether to delete S3 objects after processing them.false
max-object-ageHow long to wait since last modification before file cleaning.10m
timeoutThe global timeout."30s"
influx-serverThe InfluxDB server address.""
influx-tokenThe InfluxDB token.""
influx-orgThe InfluxDB org to write to.""
influx-bucketThe InfluxDB bucket write to.""
measurementA measurement acts as a container for tags, fields, and timestamps. Use a measurement name that describes your data.""
timestamp-rowThe timestamp row in CSV."timestamp"
timestamp-layoutThe layout to parse timestamp."2006-01-02T15:04:05.000Z"
tagTags to add to InfluxDB point. Could be of the form --tag=foo if tag name matches CSV row or --tag='foo={row:bar}' to specify row.""
fieldFields to add to InfluxDB point. Could be of the form --field='foo={type:int,row:bar}', if not specified, CSV row matches field name. Type can be float, int, string or bool.""
max-routinesThe max number of concurrent object processing routines.100

License

Distributed under the Apache 2.0 License. See LICENSE for more information.

Versioning

We use SemVer for versioning.

Help

Got a question? File a GitHub issue.

# Packages

No description provided by the author