pkg.gl

Categorygithub.com/gomlx/gomlxmldata

package

0.14.0

Repository: https://github.com/gomlx/gomlx.git

Documentation: pkg.go.dev

# Packages

downloader

Package downloader implements download in parallel of various URLs, with various progress report callback.

hdf5

Package hdf5 provides a trivial API to access HDF5 file contents.

huggingface

Package huggingface 🤗 provides functionality do download HuggingFace (HF) models and extract tensors stored in the ".safetensors" format.

# Functions

Batch

Batch creates dataset that batches `ds` into batches of size `batchSize`.

ByteCountIEC

ByteCountIEC converts a byte count to string using the appropriate unit (B, Kb, MiB, GiB, ...).

CopyWithProgressBar

CopyWithProgressBar is similar to io.Copy, but updates the progress bar with the amount of data copied.

CustomParallel

CustomParallel builds a ParallelDataset that can be used to parallelize any train.Dataset, as long as the underlying dataset ds is thread-safe.

Download

Download file from url and save at given path.

DownloadAndUntarIfMissing

DownloadAndUntarIfMissing downloads tarFile from given url, if file not there yet, and then untar it if the target directory is missing.

DownloadAndUnzipIfMissing

DownloadAndUnzipIfMissing downloads `zipFile` from given url, if file not there yet.

DownloadIfMissing

DownloadIfMissing will check if the path exists already, and if not it will download the file from the given URL.

FileExists

FileExists returns true if file or directory exists.

Freeing

Freeing implements a sequential dataset (it should not to be parallelized) that immediately releases the yielded inputs and labels in between each `Yield` call, not waiting for garbage collection.

GobDeserializeInMemory

GobDeserializeInMemory dataset from the decoder.

InMemory

InMemory creates dataset that reads the whole contents of `ds` into memory.

InMemoryFromData

InMemoryFromData creates an InMemoryDataset from the static data given -- it is immediately converted to a tensor, if not a tensor already.

Map

Map maps a dataset through a transformation with a (normal Go) function that runs in the host cpu.

MapWithGraphFn

MapWithGraphFn returns a `train.Dataset` with the result of applying (mapping) the batches yielded by the provided `dataset` by the graph function `graphFn`.

NewConstantDataset

NewConstantDataset returns a dataset that yields always the scalar 0.

Normalization

Normalization calculates the normalization parameters `mean` and `stddev` for the `inputsIndex`-th input from the given dataset.

Parallel

Parallel parallelizes yield calls of any tread-safe train.Dataset.

ParseGzipCSVFile

ParseGzipCSVFile opens a `CSV.gz` file and iterates over each of its rows, calling `perRowFn`, with a slice of strings for each cell value in the row.

ReadAhead

ReadAhead returns a Dataset that reads bufferSize elements of the given `ds` so that when Yield is called, the results are immediate.

ReplaceTildeInDir

ReplaceTildeInDir by the user's home directory.

ReplaceZerosByOnes

ReplaceZerosByOnes replaces any zero values in x by one.

Take

Take returns a wrapper to `ds`, a `train.Dataset` that only yields `n` batches.

Untar

Untar file, using decompression flags according to suffix: .gz for gzip, bz2 for bzip2.

Unzip

Unzip file, from the given zipBaseDir.

ValidateChecksum

ValidateChecksum verifies that the checksum of the file in the given path matches the checksum given.

# Structs

InMemoryDataset

InMemoryDataset represents a Dataset that has been completely read into the memory of the device it was created with -- the platform of the associated `graph.Backend`.

ParallelDataset

ParallelDataset is a wrapper around a `train.Dataset` that parallelize calls to Yield.

# Type aliases

MapExampleFn

MapExampleFn if normal Go function that applies a transformation to the inputs/labels of a dataset.

MapGraphFn

MapGraphFn if a graph building function that transforms inputs and labels.