modulepackage
0.0.0-20230712180008-5d42db8f0d47
Repository: https://github.com/segmentio/parquet-go.git
Documentation: pkg.go.dev
# README
Project has been Archived
Development has moved to https://github.com/parquet-go/parquet-go. No API's have changed, we just decided to create a new organization for this library. Thank you to all of the contributors for your hard work.
segmentio/parquet-go
High-performance Go library to manipulate parquet files.
# Packages
Package bloom implements parquet bloom filters.
Package compress provides the generic APIs implemented by parquet compression codecs.
No description provided by the author
Package encoding provides the generic APIs implemented by parquet encodings in its sub-packages.
No description provided by the author
Package hashprobe provides implementations of probing tables for various data types.
Package sparse contains abstractions to help work on arrays of values in sparse memory locations.
# Functions
AppendRow appends to row the given list of column values.
Ascending constructs a SortingColumn value which dictates to sort the column at the path given as argument in ascending order.
AsyncPages wraps the given Pages instance to perform page reads asynchronously in a separate goroutine.
BloomFilters creates a configuration option which defines the bloom filters that parquet writers should generate.
BooleanValue constructs a BOOLEAN parquet value from the bool passed as argument.
BSON constructs a leaf node of BSON logical type.
ByteArrayValue constructs a BYTE_ARRAY parquet value from the byte slice passed as argument.
ColumnBufferCapacity creates a configuration option which defines the size of row group column buffers.
ColumnIndexSizeLimit creates a configuration option to customize the size limit of page boundaries recorded in column indexes.
ColumnPageBuffers creates a configuration option to customize the buffer pool used when constructing row groups.
CompareDescending constructs a comparison function which inverses the order of values.
CompareNullsFirst constructs a comparison function which assumes that null values are smaller than all other values.
CompareNullsLast constructs a comparison function which assumes that null values are greater than all other values.
Compressed wraps the node passed as argument to use the given compression codec.
Compression creates a configuration option which sets the default compression codec used by a writer for columns where none were defined.
Convert constructs a conversion function from one parquet schema to another.
ConvertRowGroup constructs a wrapper of the given row group which applies the given schema conversion to its rows.
ConvertRowReader constructs a wrapper of the given row reader which applies the given schema conversion to the rows.
CopyPages copies pages from src to dst, returning the number of values that were copied.
CopyRows copies rows from src to dst.
CopyValues copies values from src to dst, returning the number of values that were written.
CreatedBy creates a configuration option which sets the name of the application that created a parquet file.
DataPageStatistics creates a configuration option which defines whether data page statistics are emitted.
DataPageVersion creates a configuration option which configures the version of data pages used when creating a parquet file.
Date constructs a leaf node of DATE logical type.
Decimal constructs a leaf node of decimal logical type with the given scale, precision, and underlying type.
DedupeRowReader constructs a row reader which drops duplicated consecutive rows, according to the comparator function passed as argument.
DedupeRowWriter constructs a row writer which drops duplicated consecutive rows, according to the comparator function passed as argument.
DeepEqual returns true if v1 and v2 are equal, including their repetition levels, definition levels, and column indexes.
DefaultFileConfig returns a new FileConfig value initialized with the default file configuration.
DefaultReaderConfig returns a new ReaderConfig value initialized with the default reader configuration.
DefaultRowGroupConfig returns a new RowGroupConfig value initialized with the default row group configuration.
DefaultSortingConfig returns a new SortingConfig value initialized with the default row group configuration.
DefaultWriterConfig returns a new WriterConfig value initialized with the default writer configuration.
Descending constructs a SortingColumn value which dictates to sort the column at the path given as argument in descending order.
DoubleValue constructs a DOUBLE parquet value from the float64 passed as argument.
DropDuplicatedRows configures whether a sorting writer will keep or remove duplicated rows.
Encoded wraps the node passed as argument to use the given encoding.
Enum constructs a leaf node with a logical type representing enumerations.
Equal returns true if v1 and v2 are equal.
FileReadMode is a file configuration option which controls the way pages are read.
FileSchema is used to pass a known schema in while opening a Parquet file.
FilterRowReader constructs a RowReader which exposes rows from reader for which the predicate has returned true.
FilterRowWriter constructs a RowWriter which writes rows to writer for which the predicate has returned true.
Find uses the ColumnIndex passed as argument to find the page in a column chunk (determined by the given ColumnIndex) that the given value is expected to be found in.
FixedLenByteArrayType constructs a type for fixed-length values of the given size (in bytes).
FixedLenByteArrayValue constructs a BYTE_ARRAY parquet value from the byte slice passed as argument.
FloatValue constructs a FLOAT parquet value from the float32 passed as argument.
Int constructs a leaf node of signed integer logical type of the given bit width.
Int32Value constructs a INT32 parquet value from the int32 passed as argument.
Int64Value constructs a INT64 parquet value from the int64 passed as argument.
Int96Value constructs a INT96 parquet value from the deprecated.Int96 passed as argument.
JSON constructs a leaf node of JSON logical type.
KeyValueMetadata creates a configuration option which adds key/value metadata to add to the metadata of parquet files.
Leaf returns a leaf node of the given type.
List constructs a node of LIST logical type.
LookupCompressionCodec returns the compression codec associated with the given code.
LookupEncoding returns the parquet encoding associated with the given code.
MakeRow constructs a Row from a list of column values.
Map constructs a node of MAP logical type.
MaxRowsPerRowGroup configures the maximum number of rows that a writer will produce in each row group.
MergeRowGroups constructs a row group which is a merged view of rowGroups.
MergeRowReader constructs a RowReader which creates an ordered sequence of all the readers using the given compare function as the ordering predicate.
MultiRowGroup wraps multiple row groups to appear as if it was a single RowGroup.
MultiRowWriter constructs a RowWriter which dispatches writes to all the writers passed as arguments.
NewBuffer constructs a new buffer, using the given list of buffer options to configure the buffer returned by the function.
NewBufferPool creates a new in-memory page buffer pool.
NewColumnIndex constructs a ColumnIndex instance from the given parquet format column index.
NewFileBufferPool creates a new on-disk page buffer pool.
NewFileConfig constructs a new file configuration applying the options passed as arguments.
NewGenericBuffer is like NewBuffer but returns a GenericBuffer[T] suited to write rows of Go type T.
NewGenericReader is like NewReader but returns GenericReader[T] suited to write rows of Go type T.
No description provided by the author
NewGenericWriter is like NewWriter but returns a GenericWriter[T] suited to write rows of Go type T.
NewReader constructs a parquet reader reading rows from the given io.ReaderAt.
NewReaderConfig constructs a new reader configuration applying the options passed as arguments.
NewRowBuffer constructs a new row buffer.
NewRowBuilder constructs a RowBuilder which builds rows for the parquet schema passed as argument.
NewRowGroupConfig constructs a new row group configuration applying the options passed as arguments.
NewRowGroupReader constructs a new Reader which reads rows from the RowGroup passed as argument.
No description provided by the author
NewSchema constructs a new Schema object with the given name and root node.
NewSortingConfig constructs a new sorting configuration applying the options passed as arguments.
NewSortingWriter constructs a new sorting writer which writes a parquet file where rows of each row group are ordered according to the sorting columns configured on the writer.
NewWriter constructs a parquet writer writing a file to the given io.Writer.
NewWriterConfig constructs a new writer configuration applying the options passed as arguments.
NullsFirst wraps the SortingColumn passed as argument so that it instructs the row group to place null values first in the column.
NulLValue constructs a null value, which is the zero-value of the Value type.
OpenFile opens a parquet file and reads the content between offset 0 and the given size in r.
Optional wraps the given node to make it optional.
PageBufferSize configures the size of column page buffers on parquet writers.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
Read reads and returns rows from the parquet file in the given reader.
ReadBufferSize is a file configuration option which controls the default buffer sizes for reads made to the provided io.Reader.
ReadFile reads rows of the parquet file at the given path.
Release is a helper function to decrement the reference counter of pages backed by memory which can be granularly managed by the application.
Repeated wraps the given node to make it repeated.
Required wraps the given node to make it required.
Retain is a helper function to increment the reference counter of pages backed by memory which can be granularly managed by the application.
ScanRowReader constructs a RowReader which exposes rows from reader until the predicate returns false for one of the rows, or EOF is reached.
SchemaOf constructs a parquet schema from a Go value.
Search is like Find, but uses the default ordering of the given type.
SkipBloomFilters is a file configuration option which prevents automatically reading the bloom filters when opening a parquet file, when set to true.
SkipPageIndex is a file configuration option which prevents automatically reading the page index when opening a parquet file, when set to true.
SortingBuffers creates a configuration option which sets the pool of buffers used to hold intermediary state when sorting parquet rows.
SortingColumns creates a configuration option which defines the sorting order of columns in a row group.
SortingRowGroupConfig is a row group option which applies configuration specific sorting row groups.
SortingWriterConfig is a writer option which applies configuration specific to sorting writers.
SplitBlockFilter constructs a split block bloom filter object for the column at the given path, with the given bitsPerValue.
String constructs a leaf node of UTF8 logical type.
Time constructs a leaf node of TIME logical type.
Timestamp constructs of leaf node of TIMESTAMP logical type.
TransformRowReader constructs a RowReader which applies the given transform to each row rad from reader.
TransformRowWriter constructs a RowWriter which applies the given transform to each row writter to writer.
Uint constructs a leaf node of unsigned integer logical type of the given bit width.
UUID constructs a leaf node of UUID logical type.
ValueOf constructs a parquet value from a Go value v.
Write writes the given list of rows to a parquet file written to w.
WriteBufferSize configures the size of the write buffer.
Write writes the given list of rows to a parquet file written to w.
ZeroValue constructs a zero value of the given kind.
# Constants
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
MaxColumnDepth is the maximum column depth supported by this package.
MaxColumnIndex is the maximum column index supported by this package.
MaxDefinitionLevel is the maximum definition level supported by this package.
MaxRepetitionLevel is the maximum repetition level supported by this package.
MaxRowGroups is the maximum number of row groups which can be contained in a single parquet file.
ReadModeAsync reads pages asynchronously in the background.
ReadModeSync reads pages synchronously on demand (Default).
# Variables
BitPacked is the deprecated bit-packed encoding for repetition and definition levels.
No description provided by the author
Brotli is the BROTLI parquet compression codec.
No description provided by the author
ByteStreamSplit is an encoding for floating-point data.
DeltaBinaryPacked is the delta binary packed parquet encoding.
DeltaByteArray is the delta byte array parquet encoding.
DeltaLengthByteArray is the delta length byte array parquet encoding.
No description provided by the author
ErrCorrupted is an error returned by the Err method of ColumnPages instances when they encountered a mismatch between the CRC checksum recorded in a page header and the one computed while reading the page data.
ErrConversion is used to indicate that a conversion betwen two values cannot be done because there are no rules to translate between their physical types.
ErrMissingPageHeader is an error returned when a page reader encounters a malformed page header which is missing page-type-specific information.
ErrMissingRootColumn is an error returned when opening an invalid parquet file which does not have a root column.
ErrRowGroupSchemaMismatch is an error returned when attempting to write a row group but the source and destination schemas differ.
ErrRowGroupSchemaMissing is an error returned when attempting to write a row group but the source has no schema.
ErrRowGroupSortingColumnsMismatch is an error returned when attempting to write a row group but the sorting columns differ in the source and destination.
ErrSeekOutOfRange is an error returned when seeking to a row index which is less than the first row of a page.
ErrTooManyRowGroups is returned when attempting to generate a parquet file with more than MaxRowGroups row groups.
ErrUnexpectedDefinitionLevels is an error returned when attempting to decode definition levels into a page which is part of a required column.
ErrUnexpectedDictionaryPage is an error returned when a page reader encounters a dictionary page after the first page, or in a column which does not use a dictionary encoding.
ErrUnexpectedRepetitionLevels is an error returned when attempting to decode repetition levels into a page which is not part of a repeated column.
No description provided by the author
Gzip is the GZIP parquet compression codec.
No description provided by the author
No description provided by the author
No description provided by the author
Lz4Raw is the LZ4_RAW parquet compression codec.
No description provided by the author
No description provided by the author
No description provided by the author
Plain is the default parquet encoding.
PlainDictionary is the plain dictionary parquet encoding.
RLE is the hybrid bit-pack/run-length parquet encoding.
RLEDictionary is the RLE dictionary parquet encoding.
Snappy is the SNAPPY parquet compression codec.
Uncompressed is a parquet compression codec representing uncompressed pages.
Zstd is the ZSTD parquet compression codec.
# Structs
Buffer represents an in-memory group of parquet rows.
Column represents a column in a parquet file.
ConvertError is an error type returned by calls to Convert when the conversion of parquet schemas is impossible or the input row for the conversion is malformed.
DataPageHeaderV1 is an implementation of the DataPageHeader interface representing data pages version 1.
DataPageHeaderV2 is an implementation of the DataPageHeader interface representing data pages version 2.
DictionaryPageHeader is an implementation of the PageHeader interface representing dictionary pages.
File represents a parquet file.
The FileConfig type carries configuration options for parquet files.
GenericBuffer is similar to a Buffer but uses a type parameter to define the Go type representing the schema of rows in the buffer.
GenericReader is similar to a Reader but uses a type parameter to define the Go type representing the schema of rows being read.
GenericWriter is similar to a Writer but uses a type parameter to define the Go type representing the schema of rows being written.
LeafColumn is a struct type representing leaf columns of a parquet schema.
Deprecated: A Reader reads Go values from parquet files.
The ReaderConfig type carries configuration options for parquet readers.
RowBuffer is an implementation of the RowGroup interface which stores parquet rows in memory.
RowBuilder is a type which helps build parquet rows incrementally by adding values to columns.
The RowGroupConfig type carries configuration options for parquet row groups.
Schema represents a parquet schema created from a Go value.
The SortingConfig type carries configuration options for parquet row groups.
SortingWriter is a type similar to GenericWriter but it ensures that rows are sorted according to the sorting columns configured on the writer.
The Value type is similar to the reflect.Value abstraction of Go values, but for parquet values.
Deprecated: A Writer uses a parquet schema and sequence of Go values to produce a parquet file to an io.Writer.
The WriterConfig type carries configuration options for parquet writers.
# Interfaces
BloomFilter is an interface allowing applications to test whether a key exists in a bloom filter.
The BloomFilterColumn interface is a declarative representation of bloom filters used when configuring filters on a parquet writer.
BooleanReader is an interface implemented by ValueReader instances which expose the content of a column of boolean values.
BooleanWriter is an interface implemented by ValueWriter instances which support writing columns of boolean values.
BufferPool is an interface abstracting the underlying implementation of page buffer pools.
ByteArrayReader is an interface implemented by ValueReader instances which expose the content of a column of variable length byte array values.
ByteArrayWriter is an interface implemented by ValueWriter instances which support writing columns of variable length byte array values.
ColumnBuffer is an interface representing columns of a row group.
The ColumnChunk interface represents individual columns of a row group.
No description provided by the author
The ColumnIndexer interface is implemented by types that support generating parquet column indexes.
Conversion is an interface implemented by types that provide conversion of parquet rows from one schema to another.
DataPageHeader is a specialization of the PageHeader interface implemented by data pages.
The Dictionary interface represents type-specific implementations of parquet dictionaries.
DoubleReader is an interface implemented by ValueReader instances which expose the content of a column of double-precision float point values.
DoubleWriter is an interface implemented by ValueWriter instances which support writing columns of double-precision floating point values.
Field instances represent fields of a parquet node, which associate a node to their name in their parent node.
FileOption is an interface implemented by types that carry configuration options for parquet files.
FixedLenByteArrayReader is an interface implemented by ValueReader instances which expose the content of a column of fixed length byte array values.
FixedLenByteArrayWriter is an interface implemented by ValueWriter instances which support writing columns of fixed length byte array values.
FloatReader is an interface implemented by ValueReader instances which expose the content of a column of single-precision floating point values.
FloatWriter is an interface implemented by ValueWriter instances which support writing columns of single-precision floating point values.
Int32Reader is an interface implemented by ValueReader instances which expose the content of a column of int32 values.
Int32Writer is an interface implemented by ValueWriter instances which support writing columns of 32 bits signed integer values.
Int64Reader is an interface implemented by ValueReader instances which expose the content of a column of int64 values.
Int64Writer is an interface implemented by ValueWriter instances which support writing columns of 64 bits signed integer values.
Int96Reader is an interface implemented by ValueReader instances which expose the content of a column of int96 values.
Int96Writer is an interface implemented by ValueWriter instances which support writing columns of 96 bits signed integer values.
Node values represent nodes of a parquet schema.
No description provided by the author
Page values represent sequences of parquet values.
PageHeader is an interface implemented by parquet page headers.
PageReader is an interface implemented by types that support producing a sequence of pages.
Pages is an interface implemented by page readers returned by calling the Pages method of ColumnChunk instances.
PageWriter is an interface implemented by types that support writing pages to an underlying storage medium.
ReaderOption is an interface implemented by types that carry configuration options for parquet readers.
RowGroup is an interface representing a parquet row group.
RowGroupOption is an interface implemented by types that carry configuration options for parquet row groups.
RowGroupReader is an interface implemented by types that expose sequences of row groups to the application.
RowGroupWriter is an interface implemented by types that allow the program to write row groups.
RowReader reads a sequence of parquet rows.
RowReaderFrom reads parquet rows from reader.
RowReaderWithSchema is an extension of the RowReader interface which advertises the schema of rows returned by ReadRow calls.
RowReadSeeker is an interface implemented by row readers which support seeking to arbitrary row positions.
Rows is an interface implemented by row readers returned by calling the Rows method of RowGroup instances.
RowSeeker is an interface implemented by readers of parquet rows which can be positioned at a specific row index.
RowWriter writes parquet rows to an underlying medium.
RowWriterTo writes parquet rows to a writer.
RowWriterWithSchema is an extension of the RowWriter interface which advertises the schema of rows expected to be passed to WriteRow calls.
SortingColumn represents a column by which a row group is sorted.
SortingOption is an interface implemented by types that carry configuration options for parquet sorting writers.
TimeUnit represents units of time in the parquet type system.
The Type interface represents logical types of the parquet type system.
ValueReader is an interface implemented by types that support reading batches of values.
ValueReaderAt is an interface implemented by types that support reading values at offsets specified by the application.
ValueReaderFrom is an interface implemented by value writers to read values from a reader.
ValueWriter is an interface implemented by types that support reading batches of values.
ValueWriterTo is an interface implemented by value readers to write values to a writer.
WriterOption is an interface implemented by types that carry configuration options for parquet writers.
# Type aliases
No description provided by the author
Kind is an enumeration type representing the physical types supported by the parquet type system.
ReadMode is an enum that is used to configure the way that a File reads pages.
Row represents a parquet row as a slice of values.
RowReaderFunc is a function type implementing the RowReader interface.
RowWriterFunc is a function type implementing the RowWriter interface.
ValueReaderFunc is a function type implementing the ValueReader interface.
ValueWriterFunc is a function type implementing the ValueWriter interface.