modulepackage
0.0.0-20210513144143-06dddf1ad665
Repository: https://github.com/scritchley/orc.git
Documentation: pkg.go.dev
# README
orc
Project Status
This project is still a work in progress.
Current Support
Column Encoding | Read | Write | Go Type |
---|---|---|---|
SmallInt, Int, BigInt | ✓ | int64 | |
Float, Double | ✓ | float32, float64 | |
String, Char, and VarChar | ✓ | string | |
Boolean | ✓ | bool | |
TinyInt | ✓ | byte | |
Binary | ✓ | []byte | |
Decimal | ✓ | orc.Decimal | |
Date | ✓ | orc.Date (time.Time) | |
Timestamp | ✓ | time.Time | |
Struct | ✓ | orc.Struct (map[string]interface{}) | |
List | ✓ | []interface{} | |
Map | ✓ | []orc.MapEntry | |
Union | ✓ | interface{} |
- The writer support is in its late stages, however, I do not recommend using it yet.
Example
r, err := Open("./examples/demo-12-zlib.orc")
if err != nil {
log.Fatal(err)
}
defer r.Close()
// Create a new Cursor reading the provided columns.
c := r.Select("_col0", "_col1", "_col2")
// Iterate over each stripe in the file.
for c.Stripes() {
// Iterate over each row in the stripe.
for c.Next() {
// Retrieve a slice of interface values for the current row.
log.Println(c.Row())
}
}
if err := c.Err(); err != nil {
log.Fatal(err)
}
# Functions
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
NewBaseTreeReader return a new BaseTreeReader from the provided io.Reader.
NewBaseTreeWriter is a TreeWriter that is embedded in all other TreeWriter implementations.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
NewBufferedWriter returns a new BufferedWriter using the provided CompressionCodec.
No description provided by the author
No description provided by the author
NewDateTreeReader returns a new DateTreeReader along with any error that occurs.
NewDateTreeWriter returns a new DateTreeWriter.
No description provided by the author
NewDecimalTreeReader returns a new instances of a DecimalTreeReader or an error if one occurs.
NewDictionary returns a new Dictionary intialised with the provided initialCapacity.
NewDictionaryV2 returns a new DictionaryV2 intialised with the provided initialCapacity.
No description provided by the author
NewFloatTreeWriter returns a new FloatTreeWriter or an error if one occurs.
No description provided by the author
NewIntegerTreeReader returns a new IntegerReader or an error if one occurs.
NewIntegerTreeWriter returns a new IntegerTreeWriter.
No description provided by the author
No description provided by the author
NewMapTreeReader returns a new instance of a MapTreeReader.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
NewStringTreeReader returns a StringTreeReader implementation along with any error that occurs.s.
NewStringTreeWriter returns a new StringTreeWriter or an error if one occurs.
No description provided by the author
No description provided by the author
NewStructTreeWriter returns a StructTreeWriter using the provided io.Writer and children TreeWriters.
No description provided by the author
NewTimestampTreeReader returns a new TimestampTreeReader along with any error that occurs.
NewTimestampTreeWriter returns a new TimestampTreeWriter.
No description provided by the author
NewUnionTreeReader returns a new instance of a UnionTreeReader or an error if one occurs.
NewUnionTreeWriter returns a UnionTreeWriter using the provided io.Writer and children TreeWriters.
NewWriter returns a new ORC file writer that writes to the provided io.Writer.
Open opens the file at the provided filepath.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
# Constants
No description provided by the author
DictionaryEncodingThreshold is the threshold ratio of unique items to the total count of items.
InitialDictionarySize is the initial size used when creating the dictionary.
No description provided by the author
No description provided by the author
No description provided by the author
MaxScope is the maximum number of values that can be buffered before being flushed.
MaxShortRepeatLength is the maximum run length used for RLEV2IntShortRepeat sequences.
No description provided by the author
MinRepeatSize is the minimum number of repeated values required to use run length encoding.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
TimestampBaseSeconds is 1 January 2015, the base value for all timestamp values.
# Variables
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
DefaultCompressionChunkSize is the default size of compression chunks within each stream.
DefaultRowIndexStride is the default number of rows between indexes.
DefaultStripeTargetRowCount is the number of rows over which a stripe should be written to the underlying file.
DefaultStripeTargetSize is the size in bytes over which a stripe should be written to the underlying file.
DefaultStripeWriterTimezone is the timezone that writer adds into the stripe footer.
No description provided by the author
No description provided by the author
Version0_11 is an ORC file version compatible with Hive 0.11.
Version0_12 is an ORC file version compatible with Hive 0.12.
WriterImplementation identifies the writer implementation.
WriterVersion identifies the writer version being used.
# Structs
No description provided by the author
BaseTreeReader wraps a *BooleanReader and is used for reading the Present stream in all TreeReader implementations.
BaseTreeWriter is a TreeWriter implementation that writes to the present stream.
BinaryTreeReader is a TreeReader that reads a Binary type column.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
CompressionNone is a CompressionCodec that implements no compression.
No description provided by the author
CompressionSnappy implements the CompressionCodec for Snappy compression.
CompressionSnappyDecoder implements the decoder for CompressionSnappy.
No description provided by the author
No description provided by the author
CompressionZlibDecoder implements the CompressionCodec for Zlib compression.
CompressionZlibEncoder implements the CompressionCodec for Zlib compression.
Cursor is used for iterating through the stripes and rows within the ORC file.
Date is a date value represented by an underlying time.Time.
DateTreeReader is a TreeReader implementation that can read date column types.
DateTreeWriter is a TreeWriter implementation that writes an Date type column.
Decimal is a decimal type.
DecimalTreeReader is a TreeReader that reads a Decimal type column.
Dictionary is a data structure that holds a distinct set of string values.
Dictionary is a data structure that holds a distinct set of string values.
No description provided by the author
FloatTreeWriter is a TreeWriter that writes to a Float or Double column type.
No description provided by the author
IntegerTreeReader is a TreeReader that can read Integer type streams.
IntegerTreeWriter is a TreeWriter implementation that writes an integer type column.
No description provided by the author
No description provided by the author
MapEntry is an individual entry in a Map.
MapTreeReader is a TreeReader that reads from map encoded columns.
No description provided by the author
No description provided by the author
RunLengthByteReader reads a byte run length encoded stream from ByteReader r.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
Stream is an individual stream for the TreeWriter.
No description provided by the author
StringDirectTreeReader is a StringTreeReader implementation that can read direct encoded string type columns.
No description provided by the author
StringTreeWriter is a TreeWriter implementation that writes to a string type column.
No description provided by the author
No description provided by the author
StructTreeWriter is a TreeWriter implementation that can write a struct column type.
No description provided by the author
TimestampTreeReader is a TreeReader implementation that reads timestamp type columns.
TimestampTreeWriter is a TreeWriter implementation that writes an Timestamp type column.
No description provided by the author
UnionTreeReader is a TreeReader that reads a Union type column.
UnionTreeWriter is a TreeWriter implementation that can write a unionvalue column type.
No description provided by the author
Version is the version of the ORC file.
No description provided by the author
# Interfaces
No description provided by the author
CompressionCodec is an interface that provides methods for creating an Encoder or Decoder of the CompressionCodec implementation.
IntegerReader is an interface that provides methods for reading an integer stream that uses V1 or V2 encoding methods.
IntegerWriter is an interface implemented by all integer type writers.
No description provided by the author
No description provided by the author
IntegerReader is an interface that provides methods for reading a string stream.
TimestampWriter is an interface implemented by all Timestamp type writers.
TreeReader is an interface that provides methods for reading an individual stream.
TreeWriter is an interface for writing to a stream.
# Type aliases
Double is ORC double type i.e.
No description provided by the author
No description provided by the author
RLEEncodingType is a run length encoding type specified within the Apache ORC file documentation: https://orc.apache.org/docs/run-length.html.
No description provided by the author
No description provided by the author
No description provided by the author