Categorygithub.com/mashiike/s3-select-sql-driver
repositorypackage
0.3.0
Repository: https://github.com/mashiike/s3-select-sql-driver.git
Documentation: pkg.go.dev

# Packages

No description provided by the author

# README

s3-select-sql-driver

Documentation Latest GitHub tag Github Actions test Go Report Card License

S3 Select sql driver for Go's database/sql package

Usage

for example:

package main

import (
	"context"
	"database/sql"
	"log"
	"time"

	_ "github.com/mashiike/s3-select-sql-driver"
)

func main() {
	db, err := sql.Open("s3-select", "s3://example-com/abc.csv?format=csv")
	if err != nil {
		log.Fatalln(err)
	}
	defer db.Close()
	rows, err := db.QueryContext(
		context.Background(),
		`SELECT timestamp, message FROM s3object s`,
	)
	if err != nil {
		log.Fatalln(err)
	}
	defer rows.Close()
	for rows.Next() {
		var timestamp time.Time
		var message string
		err := rows.Scan(&timestamp, &message)
		if err != nil {
			log.Println(err)
			break
		}
		log.Printf("%s\t%s", timestamp, message)
	}
}

Placeholders

S3 Select SQL driver supports placeholders.

for example, can use ordinal placeholders

rows, err := db.QueryContext(
    context.Background(),
    `SELECT timestamp, message FROM s3object s WHERE timestamp > ?`,
    time.Now().Add(-time.Hour),
)

and named placeholders

rows, err := db.QueryContext(
    context.Background(),
    `SELECT timestamp, message FROM s3object s WHERE timestamp > :timestamp`,
    sql.Named("timestamp", time.Now().Add(-time.Hour)),
)

DSN format

s3://<bucket>/<key>?<query>

query parameters

namedescriptiondefault
formatobject format (csv,tsv,json,json_lines,parquet)file ext auto detect
compression_typegzip or bzip, nonenone
parse_timeparse time columnfalse
input_serializationinput serialization base64 json
regionaws region

input serialization base64 json

if set complex format, you can set input serialization in DSN

see also S3 Select InputSerialization

{
  "CSV": {
    "FileHeaderInfo": "NONE",
    "RecordDelimiter": "\n",
    "FieldDelimiter": ",",
    "QuoteCharacter": "\"",
    "QuoteEscapeCharacter": "\"",
    "Comments": "#",
    "AllowQuotedRecordDelimiter": false
  },
  "CompressionType": "NONE"
}

and can set DSN query parameter(json base64 encoded)

s3://example-com/hoge.csv?input_serialization=ewogICJDU1YiOiB7CiAgICAiRmlsZUhlYWRlckluZm8iOiAiTk9ORSIsCiAgICAiUmVjb3JkRGVsaW1pdGVyIjogIlxuIiwKICAgICJGaWVsZERlbGltaXRlciI6ICIsIiwKICAgICJRdW90ZUNoYXJhY3RlciI6ICJcIiIsCiAgICAiUXVvdGVFc2NhcGVDaGFyYWN0ZXIiOiAiXCIiLAogICAgIkNvbW1lbnRzIjogIiMiLAogICAgIkFsbG93UXVvdGVkUmVjb3JkRGVsaW1pdGVyIjogZmFsc2UKICB9LAogICJDb21wcmVzc2lvblR5cGUiOiAiTk9ORSIKfQo

Prefix Search and LIMIT clause

S3 Select SQL driver supports prefix search and LIMIT clause. for example: DSN is s3://example-com/data/?format=csv&compression_type=gzip Query is SELECT * FROM s3object s LIMIT 10

S3 Select SQL driver search s3://example-com/data/ prefix and get 10 objects. case of following objects in bucket.

s3://example-com/data/2020-01-01.csv.gz
s3://example-com/data/2020-01-02.csv.gz
s3://example-com/data/2020-01-03.csv.gz
s3://example-com/data/2020-01-04.csv.gz
...

each object is 5 rows, execute S3 Select object is 2020-01-01.csv.gz and 2020-01-02.csv.gz and get 10 rows.

LICENSE

MIT