# README
runtime/blob
Package blob provides a way to download a batch of files ingested from remote sources like s3/gcs using google's go cdk (https://pkg.go.dev/gocloud.dev) as per user's glob pattern.
How many files are downloaded and how much data from a file is downloaded is controlled by runtimev1.Source_ExtractPolicy
It also has support for ingesting partial files for some formats like parquet, unzipped csv/txt/tsv files.
It uses a planner
to implement strategies for downloading.
A planner has a container
which keeps track of files to be downloaded and a rowplanner
which plans how much data per file needs to be downloaded.
For partial parquet file ingestion it uses apache arrow for go
: https://github.com/apache/arrow/tree/master/go
# Functions
NewBlobObjectReader returns new instance of ObjectReader.
NewBucket wraps a *blob.Bucket.
NewIterator returns an iterator for downloading objects matching a glob pattern and extract policy.
No description provided by the author
# Constants
No description provided by the author
No description provided by the author
No description provided by the author
# Structs
Bucket wraps a blob.Bucket with functionality for implementing the drivers.ObjectStore interface.
No description provided by the author
ObjectReader reads range of bytes from cloud objects implements io.ReaderAt and io.Seeker interfaces.
No description provided by the author
# Type aliases
No description provided by the author