# Packages
Package compress contains the interfaces and implementations for handling compression/decompression of parquet data at the column levels.
Package pqarrow provides the implementation for connecting Arrow directly with the Parquet implementation, allowing isolation of all the explicitly arrow related code to this package which has the interfaces for reading and writing directly to and from arrow Arrays/Tables/Records.
Package schema provides types and functions for manipulating and building parquet file schemas.
# Functions
AlgorithmFromThrift converts the thrift object to the Algorithm struct for easier usage.
ColumnPathFromString constructs a ColumnPath from a dot separated string.
DefaultColumnProperties returns the default properties which get utilized for writing.
DisableAadPrefixStorage will set the properties to not store the AadPrefix in the file.
DisableFooterSignatureVerification skips integrity verification of plaintext footers.
NewColumnDecryptionProperties constructs a new ColumnDecryptionProperties for the given column path, modified by the provided options.
NewColumnEncryptionProperties constructs properties for the provided column path, modified by the options provided.
NewFileDecryptionProperties takes in the options for constructing a new FileDecryptionProperties object, otherwise it will use the default configuration which will check footer integrity of a plaintext footer for an encrypted file for unencrypted parquet files, the decryption properties should not be set.
NewFileEncryptionProperties returns a new File Encryption description object using the options provided.
NewInt96 creates a new Int96 from the given 3 uint32 values.
NewReaderProperties returns the default Reader Properties using the provided allocator.
NewWriterProperties takes a list of options for building the properties.
WithAadPrefix sets the AAD prefix to use for encryption and by default will store it in the file.
WithAlg sets the encryption algorithm to utilize.
WithAllocator specifies the writer to use the given allocator.
WithBatchSize specifies the number of rows to use for batch writes to columns.
WithColumnKeys sets explicit column keys.
WithCompression specifies the default compression type to use for column writing.
WithCompressionFor specifies the compression type for the given column.
WithCompressionLevel specifies the default compression level for the compressor in every column.
WithCompressionLevelFor is like WithCompressionLevel but only for the given column path.
WithCompressionLevelPath is the same as WithCompressionLevelFor but takes a ColumnPath.
WithCompressionPath is the same as WithCompressionFor but takes a ColumnPath directly.
WithCreatedBy specifies the "created by" string to use for the writer.
WithDataPageSize specifies the size to use for splitting data pages for column writing.
WithDataPageVersion specifies whether to use Version 1 or Version 2 of the DataPage spec.
WithDecryptAadPrefix explicitly supplies the file aad prefix.
WithDecryptKey specifies the key to utilize for decryption.
WithDictionaryDefault sets the default value for whether to enable dictionary encoding.
WithDictionaryFor allows enabling or disabling dictionary encoding for a given column path string.
WithDictionaryPageSizeLimit is the limit of the dictionary at which the writer will fallback to plain encoding instead.
WithDictionaryPath is like WithDictionaryFor, but takes a ColumnPath type.
WithEncoding defines the encoding that is used when we aren't using dictionary encoding.
WithEncodingFor is for defining the encoding only for a specific column path.
WithEncodingPath is the same as WithEncodingFor but takes a ColumnPath directly.
WithEncryptedColumns sets the map of columns and their properties (keys etc.) If not called, then all columns will be encrypted with the footer key.
WithEncryptionProperties specifies the file level encryption handling for writing the file.
WithFooterKey sets an explicit footer key.
WithFooterKeyID sets a key retrieval metadata to use (converted from string), this must be a utf8 string.
WithFooterKeyMetadata sets a key retrieval metadata to use for getting the key.
WithKey sets a column specific key.
WithKeyID is a convenience function to set the key metadata using a string id.
WithKeyMetadata sets the key retrieval metadata, use either KeyMetadata or KeyID but not both.
WithKeyRetriever sets a key retriever callback.
WithMaxRowGroupLength specifies the number of rows as the maximum number of rows for a given row group in the writer.
WithMaxStatsSize sets a maximum size for the statistics before we decide not to include them.
WithPlaintextAllowed sets allowing plaintext files.
WithPlaintextFooter sets the writer to write the footer in plain text, otherwise the footer will be encrypted too (which is the default behavior).
WithPrefixVerifier supplies a verifier object to use for verifying the AAD Prefixes stored in the file.
WithRootName enables customization of the name used for the root schema node.
WithRootRepetition enables customization of the repetition used for the root schema node.
WithStats specifies a default for whether or not to enable column statistics.
WithStatsFor specifies a per column value as to enable or disable statistics in the resulting file.
WithStatsPath is the same as WithStatsFor but takes a ColumnPath.
WithStoreDecimalAsInteger specifies whether to try using an int32/int64 for storing decimal data rather than fixed len byte arrays if the precision is low enough.
WithVersion specifies which Parquet Spec version to utilize for writing.
# Constants
Constants that will be used as the default values with encryption/decryption.
constants for choosing the Aes Algorithm to use for encryption/decryption.
constants for choosing the Aes Algorithm to use for encryption/decryption.
constants for the parquet DataPage Version to use.
constants for the parquet DataPage Version to use.
by default if you set the file decryption properties, we will error on any plaintext files unless otherwise specified.
Default Buffer size used for the Reader.
Constants that will be used as the default values with encryption/decryption.
Constants for default property values used for the default reader, writer and column props.
Default data page size limit is 1K it's not guaranteed, but we will try to cut data pages off at this size where possible.
Default is for dictionary encoding to be turned on, use WithDictionaryDefault writer property to change that.
If the dictionary reaches the size of this limitation, the writer will use the fallback encoding (usually plain) instead of continuing to build the dictionary index.
if encryption is turned on, we will default to also encrypting the footer.
By default we'll use AesGCM as our encryption algorithm.
Default maximum number of rows for a single row group.
If the stats are larger than 4K the writer will skip writing them out anyways.
Constants for default property values used for the default reader, writer and column props.
Default is to have stats enabled for all columns, use writer properties to change the default, or to enable/disable for specific columns.
In order to attempt to facilitate data page size limits for writing, data is written in batches.
Int96SizeBytes is the number of bytes that make up an Int96.
Constants that will be used as the default values with encryption/decryption.
v1.0.
v2.4.
v2.6.
Enable the latest parquet format 2.x features.
# Variables
ByteArraySizeBytes is the number of bytes returned by reflect.TypeOf(ByteArray{}).Size().
ByteArrayTraits provides information about the ByteArray type, which is just an []byte.
ColumnOrders contains constants for the Column Ordering fields.
DefaultColumnOrder is to use TypeDefinedOrder.
Encodings contains constants for the encoding types of the column data
The values used all correspond to the values in parquet.thrift for the corresponding encoding type.
FixedLenByteArraySizeBytes is the number of bytes returned by reflect.TypeOf(FixedLenByteArray{}).Size().
FixedLenByteArrayTraits provides information about the FixedLenByteArray type which is just an []byte.
Int96Traits provides information about the Int96 type.
Repetitions contains the constants for Field Repetition Types.
Types contains constants for the Physical Types that are used in the Parquet Spec
They can be specified when needed as such: `parquet.Types.Int32` etc.
# Structs
Algorithm describes how something was encrypted, representing the EncryptionAlgorithm object from the parquet.thrift file.
ColumnDecryptionProperties are the specifications for how to decrypt a given column.
ColumnEncryptionProperties specifies how to encrypt a given column.
ColumnProperties defines the encoding, codec, and so on for a given column.
FileDecryptionProperties define the File Level configuration for decrypting a parquet file.
FileEncryptionProperties describe how to encrypt a parquet file when writing data.
ReaderProperties are used to define how the file reader will handle buffering and allocating buffers.
WriterProperties is the collection of properties to use for writing a parquet file.
# Interfaces
AADPrefixVerifier is an interface for any object that can be used to verify the identity of the file being decrypted.
DecryptionKeyRetriever is an interface for getting the desired key for decryption from metadata.
ReaderAtSeeker is a combination of the ReaderAt and ReadSeeker interfaces from the io package defining the only functionality that is required in order for a parquet file to be read by the file functions.
# Type aliases
ByteArray is a type to be utilized for representing the Parquet ByteArray physical type, represented as a byte slice.
Creating our own enums allows avoiding the transitive dependency on the compiled thrift definitions in the public API, allowing us to not export the entire Thrift definitions, while making everything a simple cast between.
ColumnDecryptOption is the type of the options passed for constructing Decryption Properties.
ColumnEncryptOption how to specify options to the NewColumnEncryptionProperties function.
Creating our own enums allows avoiding the transitive dependency on the compiled thrift definitions in the public API, allowing us to not export the entire Thrift definitions, while making everything a simple cast between.
ColumnPathToDecryptionPropsMap maps column paths to decryption properties.
ColumnPathToEncryptionPropsMap maps column paths to encryption properties.
Creating our own enums allows avoiding the transitive dependency on the compiled thrift definitions in the public API, allowing us to not export the entire Thrift definitions, while making everything a simple cast between.
Creating our own enums allows avoiding the transitive dependency on the compiled thrift definitions in the public API, allowing us to not export the entire Thrift definitions, while making everything a simple cast between.
EncryptOption is used for specifying values when building FileEncryptionProperties.
FileDecryptionOption is how to supply options to constructing a new FileDecryptionProperties instance.
FixedLenByteArray is a go type to represent a FixedLengthByteArray as a byte slice.
Int96 is a 12 byte integer value utilized for representing timestamps as a 64 bit integer and a 32 bit integer.
Creating our own enums allows avoiding the transitive dependency on the compiled thrift definitions in the public API, allowing us to not export the entire Thrift definitions, while making everything a simple cast between.
Creating our own enums allows avoiding the transitive dependency on the compiled thrift definitions in the public API, allowing us to not export the entire Thrift definitions, while making everything a simple cast between.
Creating our own enums allows avoiding the transitive dependency on the compiled thrift definitions in the public API, allowing us to not export the entire Thrift definitions, while making everything a simple cast between.
WriterProperty is used as the options for building a writer properties instance.