Categorygithub.com/aliyun/aliyun-odps-go-sdkarrow

package

0.3.7

Repository: https://github.com/aliyun/aliyun-odps-go-sdk.git

Documentation: pkg.go.dev

# README

Apache Arrow for Go

Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and inter-process communication.

Reference Counting

The library makes use of reference counting so that it can track when memory buffers are no longer used. This allows Arrow to update resource accounting, pool memory such and track overall memory usage as objects are created and released. Types expose two methods to deal with this pattern. The Retain method will increase the reference count by 1 and Release method will reduce the count by 1. Once the reference count of an object is zero, any associated object will be freed. Retain and Release are safe to call from multiple goroutines.

When to call `Retain` / `Release`?

If you are passed an object and wish to take ownership of it, you must call Retain. You must later pair this with a call to Release when you no longer need the object. "Taking ownership" typically means you wish to access the object outside the scope of the current function call.
You own any object you create via functions whose name begins with New or Copy or when receiving an object over a channel. Therefore you must call Release once you no longer need the object.
If you send an object over a channel, you must call Retain before sending it as the receiver is assumed to own the object and will later call Release when it no longer needs the object.

Performance

The arrow package makes extensive use of c2goasm to leverage LLVM's advanced optimizer and generate PLAN9 assembly functions from C/C++ code. The arrow package can be compiled without these optimizations using the noasm build tag. Alternatively, by configuring an environment variable, it is possible to dynamically configure which architecture optimizations are used at runtime. See the cpu package README for a description of this environment variable.

Example Usage

The following benchmarks demonstrate summing an array of 8192 values using various optimizations.

Disable no architecture optimizations (thus using AVX2):

$ INTEL_DISABLE_EXT=NONE go test -bench=8192 -run=. ./math
goos: darwin
goarch: amd64
pkg: github.com/apache/arrow/go/arrow/math
BenchmarkFloat64Funcs_Sum_8192-8   	 2000000	       687 ns/op	95375.41 MB/s
BenchmarkInt64Funcs_Sum_8192-8     	 2000000	       719 ns/op	91061.06 MB/s
BenchmarkUint64Funcs_Sum_8192-8    	 2000000	       691 ns/op	94797.29 MB/s
PASS
ok  	github.com/apache/arrow/go/arrow/math	6.444s

NOTE: NONE is simply ignored, thus enabling optimizations for AVX2 and SSE4

Disable AVX2 architecture optimizations:

$ INTEL_DISABLE_EXT=AVX2 go test -bench=8192 -run=. ./math
goos: darwin
goarch: amd64
pkg: github.com/apache/arrow/go/arrow/math
BenchmarkFloat64Funcs_Sum_8192-8   	 1000000	      1912 ns/op	34263.63 MB/s
BenchmarkInt64Funcs_Sum_8192-8     	 1000000	      1392 ns/op	47065.57 MB/s
BenchmarkUint64Funcs_Sum_8192-8    	 1000000	      1405 ns/op	46636.41 MB/s
PASS
ok  	github.com/apache/arrow/go/arrow/math	4.786s

Disable ALL architecture optimizations, thus using pure Go implementation:

$ INTEL_DISABLE_EXT=ALL go test -bench=8192 -run=. ./math
goos: darwin
goarch: amd64
pkg: github.com/apache/arrow/go/arrow/math
BenchmarkFloat64Funcs_Sum_8192-8   	  200000	     10285 ns/op	6371.41 MB/s
BenchmarkInt64Funcs_Sum_8192-8     	  500000	      3892 ns/op	16837.37 MB/s
BenchmarkUint64Funcs_Sum_8192-8    	  500000	      3929 ns/op	16680.00 MB/s
PASS
ok  	github.com/apache/arrow/go/arrow/math	6.179s

# Packages

array

Package array provides implementations of various Arrow array types.

arrio

Package arrio exposes functions to manipulate records, exposing and using interfaces not unlike the ones defined in the stdlib io package.

bitutil

No description provided by the author

csv

Package csv reads CSV files and presents the extracted data as records, also writes data as record into CSV files.

decimal128

No description provided by the author

endian

No description provided by the author

flight

No description provided by the author

float16

No description provided by the author

ipc

No description provided by the author

math

Package math provides optimized mathematical functions for processing Arrow arrays.

memory

Package memory provides support for allocating and manipulating memory at a low level.

scalar

No description provided by the author

tensor

Package tensor provides types that implement n-dimensional arrays.

# Functions

CheckMetadata

CheckMetadata is an option for TypeEqual that allows checking for metadata equality besides type equality.

FixedSizeListOf

FixedSizeListOf returns the list type with element type t.

FixedSizeListOfField

No description provided by the author

FixedSizeListOfNonNullable

FixedSizeListOfNonNullable is like FixedSizeListOf but NullableElem defaults to false indicating that the child type should be marked as non-nullable.

GetExtensionType

GetExtensionType retrieves and returns the extension type of the given name from the global extension type registry.

HashType

No description provided by the author

ListOf

ListOf returns the list type with element type t.

ListOfField

No description provided by the author

ListOfNonNullable

ListOfNonNullable is like ListOf but NullableElem defaults to false, indicating that the child type should be marked as non-nullable.

MapOf

No description provided by the author

MetadataFrom

No description provided by the author

NewMetadata

No description provided by the author

NewSchema

NewSchema returns a new Schema value from the slice of fields and metadata.

RegisterExtensionType

RegisterExtensionType registers the provided ExtensionType by calling ExtensionName to use as a Key for registrying the type.

StructOf

StructOf returns the struct type with fields fs.

TypeEqual

TypeEqual checks if two DataType are the same, optionally checking metadata equality for STRUCT types.

UnregisterExtensionType

UnregisterExtensionType removes the type with the given name from the registry causing any messages with that type which come in to be expressed with their metadata and underlying type instead of the extension type that isn't known.

# Constants

BINARY

BINARY is a Variable-length byte type (no guarantee of UTF8-ness).

BOOL

BOOL is a 1 bit, LSB bit-packed ordering.

DATE32

DATE32 is int32 days since the UNIX epoch.

Date32SizeBytes

Date32SizeBytes specifies the number of bytes required to store a single Date32 in memory.

DATE64

DATE64 is int64 milliseconds since the UNIX epoch.

Date64SizeBytes

Date64SizeBytes specifies the number of bytes required to store a single Date64 in memory.

DayTimeIntervalSizeBytes

DayTimeIntervalSizeBytes specifies the number of bytes required to store a single DayTimeInterval in memory.

DECIMAL

Alias to ensure we do not break any consumers.

DECIMAL128

DECIMAL128 is a precision- and scale-based decimal type.

Decimal128SizeBytes

Decimal128SizeBytes specifies the number of bytes required to store a single decimal128 in memory.

DECIMAL256

DECIMAL256 is a precision and scale based decimal type, with 256 bit max.

DENSE_UNION

DENSE_UNION of logical types.

DICTIONARY

DICTIONARY aka Category type.

DURATION

Measure of elapsed time in either seconds, milliseconds, microseconds or nanoseconds.

DurationSizeBytes

DurationSizeBytes specifies the number of bytes required to store a single Duration in memory.

EXTENSION

Custom data type, implemented by user.

FIXED_SIZE_BINARY

FIXED_SIZE_BINARY is a binary where each value occupies the same number of bytes.

FIXED_SIZE_LIST

Fixed size list of some logical type.

FLOAT16

FLOAT16 is a 2-byte floating point value.

Float16SizeBytes

Float16SizeBytes specifies the number of bytes required to store a single float16 in memory.

FLOAT32

FLOAT32 is a 4-byte floating point value.

Float32SizeBytes

Float32SizeBytes specifies the number of bytes required to store a single float32 in memory.

FLOAT64

FLOAT64 is an 8-byte floating point value.

Float64SizeBytes

Float64SizeBytes specifies the number of bytes required to store a single float64 in memory.

INT16

INT16 is a Signed 16-bit little-endian integer.

Int16SizeBytes

Int16SizeBytes specifies the number of bytes required to store a single int16 in memory.

INT32

INT32 is a Signed 32-bit little-endian integer.

Int32SizeBytes

Int32SizeBytes specifies the number of bytes required to store a single int32 in memory.

INT64

INT64 is a Signed 64-bit little-endian integer.

Int64SizeBytes

Int64SizeBytes specifies the number of bytes required to store a single int64 in memory.

INT8

INT8 is a Signed 8-bit little-endian integer.

Int8SizeBytes

Int8SizeBytes specifies the number of bytes required to store a single int8 in memory.

INTERVAL

INTERVAL could be any of the interval types, kept to avoid breaking anyone after switching to individual type ids for the interval types that were using it when calling MakeFromData or NewBuilder Deprecated and will be removed in the next major version release.

INTERVAL_DAY_TIME

INTERVAL_DAY_TIME is DAY_TIME in SQL Style.

INTERVAL_MONTH_DAY_NANO

calendar interval with three fields.

INTERVAL_MONTHS

INTERVAL_MONTHS is YEAR_MONTH interval in SQL style.

LARGE_BINARY

like BINARY but with 64-bit offsets, not yet implemented.

LARGE_LIST

like LIST but with 64-bit offsets.

LARGE_STRING

like STRING, but 64-bit offsets.

LIST

LIST is a list of some logical data type.

MAP

MAP is a repeated struct logical type.

Microsecond

No description provided by the author

Millisecond

No description provided by the author

MonthDayNanoIntervalSizeBytes

MonthDayNanoIntervalSizeBytes specifies the number of bytes required to store a single DayTimeInterval in memory.

MonthIntervalSizeBytes

MonthIntervalSizeBytes specifies the number of bytes required to store a single MonthInterval in memory.

Nanosecond

No description provided by the author

NULL

NULL type having no physical storage.

Second

No description provided by the author

SPARSE_UNION

SPARSE_UNION of logical types.

STRING

STRING is a UTF8 variable-length string.

STRUCT

STRUCT of logical types.

TIME32

TIME32 is a signed 32-bit integer, representing either seconds or milliseconds since midnight.

Time32SizeBytes

Time32SizeBytes specifies the number of bytes required to store a single Time32 in memory.

TIME64

TIME64 is a signed 64-bit integer, representing either microseconds or nanoseconds since midnight.

Time64SizeBytes

Time64SizeBytes specifies the number of bytes required to store a single Time64 in memory.

TIMESTAMP

TIMESTAMP is an exact timestamp encoded with int64 since UNIX epoch Default unit millisecond.

TimestampSizeBytes

TimestampSizeBytes specifies the number of bytes required to store a single Timestamp in memory.

UINT16

UINT16 is an Unsigned 16-bit little-endian integer.

Uint16SizeBytes

Uint16SizeBytes specifies the number of bytes required to store a single uint16 in memory.

UINT32

UINT32 is an Unsigned 32-bit little-endian integer.

Uint32SizeBytes

Uint32SizeBytes specifies the number of bytes required to store a single uint32 in memory.

UINT64

UINT64 is an Unsigned 64-bit little-endian integer.

Uint64SizeBytes

Uint64SizeBytes specifies the number of bytes required to store a single uint64 in memory.

UINT8

UINT8 is an Unsigned 8-bit little-endian integer.

Uint8SizeBytes

Uint8SizeBytes specifies the number of bytes required to store a single uint8 in memory.

# Variables

BinaryTypes

No description provided by the author

BooleanTraits

No description provided by the author

Date32Traits

No description provided by the author

Date64Traits

No description provided by the author

DayTimeIntervalTraits

No description provided by the author

Decimal128Traits

Decimal128 traits.

DurationTraits

No description provided by the author

FixedWidthTypes

No description provided by the author

Float16Traits

Float16 traits.

Float32Traits

No description provided by the author

Float64Traits

No description provided by the author

Int16Traits

No description provided by the author

Int32Traits

No description provided by the author

Int64Traits

No description provided by the author

Int8Traits

No description provided by the author

MonthDayNanoIntervalTraits

No description provided by the author

MonthIntervalTraits

No description provided by the author

Null

No description provided by the author

PrimitiveTypes

No description provided by the author

Time32Traits

No description provided by the author

Time64Traits

No description provided by the author

TimestampTraits

No description provided by the author

Uint16Traits

No description provided by the author

Uint32Traits

No description provided by the author

Uint64Traits

No description provided by the author

Uint8Traits

No description provided by the author

# Structs

BinaryType

No description provided by the author

BooleanType

No description provided by the author

Date32Type

No description provided by the author

Date64Type

No description provided by the author

DayTimeInterval

DayTimeInterval represents a number of days and milliseconds (fraction of day).

DayTimeIntervalType

DayTimeIntervalType is encoded as a pair of 32-bit signed integer, representing a number of days and milliseconds (fraction of day).

Decimal128Type

Decimal128Type represents a fixed-size 128-bit decimal type.

DurationType

DurationType is encoded as a 64-bit signed integer, representing an amount of elapsed time without any relation to a calendar artifact.

ExtensionBase

ExtensionBase is the base struct for user-defined Extension Types which must be embedded in any user-defined types like so: type UserDefinedType struct { arrow.ExtensionBase // any other data } .

Field

No description provided by the author

FixedSizeBinaryType

No description provided by the author

FixedSizeListType

FixedSizeListType describes a nested type in which each array slot contains a fixed-size sequence of values, all having the same relative type.

Float16Type

Float16Type represents a floating point value encoded with a 16-bit precision.

Float32Type

No description provided by the author

Float64Type

No description provided by the author

Int16Type

No description provided by the author

Int32Type

No description provided by the author

Int64Type

No description provided by the author

Int8Type

No description provided by the author

ListType

ListType describes a nested type in which each array slot contains a variable-size sequence of values, all having the same relative type.

MapType

No description provided by the author

Metadata

No description provided by the author

MonthDayNanoInterval

MonthDayNanoInterval represents a number of months, days and nanoseconds (fraction of day).

MonthDayNanoIntervalType

MonthDayNanoIntervalType is encoded as two signed 32-bit integers representing a number of months and a number of days, followed by a 64-bit integer representing the number of nanoseconds since midnight for fractions of a day.

MonthIntervalType

MonthIntervalType is encoded as a 32-bit signed integer, representing a number of months.

NullType

NullType describes a degenerate array, with zero physical storage.

Schema

Schema is a sequence of Field values, describing the columns of a table or a record batch.

StringType

No description provided by the author

StructType

StructType describes a nested type parameterized by an ordered sequence of relative types, called its fields.

Time32Type

Time32Type is encoded as a 32-bit signed integer, representing either seconds or milliseconds since midnight.