package
0.2.0
Repository: https://github.com/breeze0806/go-etl.git
Documentation: pkg.go.dev

# README

go-etl Data Type Descriptions

This package primarily defines the data types used in go-etl.

Record

// Record represents a data record.
type Record interface {
 fmt.Stringer

 Add(Column) error                      // Adds a new column.
 GetByIndex(i int) (Column, error)      // Retrieves the column at the specified index.
 GetByName(name string) (Column, error) // Retrieves the column with the specified name.
 Set(i int, c Column) error             // Sets the column at the specified index.
 ColumnNumber() int                     // Returns the number of columns.
 ByteSize() int64                       // Returns the size of the record in bytes.
 MemorySize() int64                     // Returns the memory usage of the record.
}

Data Type Conversions

go-etl supports six internal data types:

  • bigInt: Fixed-point numbers (int64, int32, int16, int8, BigInt, etc.).
  • decimal: Floating-point numbers (float32, float64, BigDecimal (unlimited precision), etc.).
  • string: String type, with unlimited length and using a universal character set (Unicode).
  • time: Date and time type.
  • bool: Boolean value.
  • bytes: Binary data, which can store unstructured data such as MP3 files.

Correspondingly, there are six implementations of ColumnValue: TimeColumnValue, BigIntColumnValue, DecimalColumnValue, BytesColumnValue, StringColumnValue, and BoolColumnValue.

These ColumnValue interfaces provide a series of data type conversion methods that start with as.

// ColumnValue represents a value in a column.
type ColumnValue interface {
 fmt.Stringer

 Type() ColumnType                    // Returns the column type.
 IsNil() bool                         // Checks if the value is nil.
 AsBool() (bool, error)               // Converts the value to a boolean.
 AsBigInt() (*big.Int, error)         // Converts the value to a big integer.
 AsDecimal() (decimal.Decimal, error) // Converts the value to a decimal with unlimited precision.
 AsString() (string, error)           // Converts the value to a string.
 AsBytes() ([]byte, error)            // Converts the value to a byte array.
 AsTime() (time.Time, error)          // Converts the value to a time.
}

Based on the ColumnValue interface, the following methods are implemented:

// Column represents a data column.
type Column interface {
 ColumnValue
 AsInt64() (int64, error)     // Converts the value to a 64-bit integer.
 AsFloat64() (float64, error) // Converts the value to a 64-bit floating point number.
 Clone() (Column, error)      // Clones the column.
 Cmp(Column) (int, error)     // Compares the column with another column. Returns 1 if greater, 0 if equal, -1 if less.
 Name() string                // Returns the name of the column.
 ByteSize() int64             // Returns the size of the column in bytes.
 MemorySize() int64           // Returns the memory usage of the column.
}

The internal types of DataX are implemented using different Golang types: Currently, there are two implementation approaches, but the older approach has performance issues when dealing with large datasets. The new implementation is still in beta and has not been thoroughly validated through practical use.

  • Older Implementation Approach
Internal TypeImplementation TypeNotes
timetime.Time
bigIntbig.IntUses arbitrary-precision integers to ensure no loss of precision.
decimaldecimal.DecimalRepresented using decimal.Decimal to ensure no loss of precision.
bytes[]byte
stringstring
boolbool
  • Current Implementation Approach
Internal TypeImplementation TypeNotes
timetime.Time
bigIntBigIntNumberUses a hybrid approach of storing values as Int64 and BigIntStr to ensure no loss of precision.
decimalDecimalNumberUses a hybrid approach of storing values as Float64, Int64, BigIntStr, DecimalStr, and Decimal to ensure no loss of precision.
bytes[]byte
stringstring
boolbool

The gap between these two implementation methods mainly lies in numerical adjustments, which are integrated through the following interfaces:

// NumberConverter: Digital Converter
type NumberConverter interface {
 ConvertBigIntFromInt(i int64) (num BigIntNumber)
 ConvertDecimalFromFloat(f float64) (num DecimalNumber)
 ConvertBigInt(s string) (num BigIntNumber, err error)
 ConvertDecimal(s string) (num DecimalNumber, err error)
}

// Number: Represents a numeric value
type Number interface {
 Bool() (bool, error)
 String() string
}

// BigIntNumber: Represents a high-precision integer
type BigIntNumber interface {
 Number

 Int64() (int64, error)
 Decimal() DecimalNumber
 CloneBigInt() BigIntNumber
 AsBigInt() *big.Int
}

// DecimalNumber: Represents a high-precision decimal number
type DecimalNumber interface {
 Number

 Float64() (float64, error)
 BigInt() BigIntNumber
 CloneDecimal() DecimalNumber
 AsDecimal() decimal.Decimal
}

The main implementations are Converter (the current implementation method) and OldConverter (the previous implementation method). Converter outperforms OldConverter in terms of performance. The test results from number_bench_test.go are as follows:

BenchmarkConverter_ConvertFromBigInt-4                	34292768	        40.13 ns/op	       8 B/op	       0 allocs/op
BenchmarkOldConverter_ConvertFromBigInt-4             	19314712	        58.69 ns/op	      16 B/op	       1 allocs/op
BenchmarkConverter_ConvertDecimalFromloat-4           	100000000	        15.74 ns/op	       8 B/op	       0 allocs/op
BenchmarkOldConverter_ConvertDecimalFromFloat-4       	 1654504	       725.8 ns/op	      48 B/op	       2 allocs/op
BenchmarkConverter_ConvertBigInt_Int64-4              	 5020077	       230.0 ns/op	      39 B/op	       2 allocs/op
BenchmarkOldConverter_ConvertBigInt_Int64-4           	 2232102	       627.3 ns/op	     111 B/op	       5 allocs/op
BenchmarkCoventor_ConvertBigInt_large_number-4        	   50010	     21211 ns/op	    8064 B/op	     216 allocs/op
BenchmarkOldCoventor_ConvertBigInt_large_number-4     	   23709	     51818 ns/op	    9216 B/op	     360 allocs/op
BenchmarkConverter_ConvertDecimal_Int64-4             	 3830624	       312.6 ns/op	      39 B/op	       2 allocs/op
BenchmarkOldConverter_ConvertDecimal_Int64-4          	 1995441	       611.4 ns/op	     116 B/op	       4 allocs/op
BenchmarkConverter_ConvertDecimal_Float64-4           	 1707649	       671.4 ns/op	     178 B/op	       5 allocs/op
BenchmarkOldConverter_ConvertDecimal_Float64-4        	 1229505	       991.1 ns/op	     191 B/op	       6 allocs/op
BenchmarkConverter_ConvertDecimal-4                   	   80113	     15009 ns/op	    2280 B/op	     144 allocs/op
BenchmarkOldConverter_ConvertDecimal-4                	   56880	     26496 ns/op	    4608 B/op	     288 allocs/op
BenchmarkConverter_ConvertDecimal_large_number-4      	   45754	     22387 ns/op	    5184 B/op	     144 allocs/op
BenchmarkOldConverter_ConvertDecimal_large_number-4   	   16726	     69543 ns/op	   13248 B/op	     432 allocs/op
BenchmarkConverter_ConvertDecimal_Exp-4               	   15516	     86355 ns/op	   18432 B/op	     648 allocs/op
BenchmarkOldConverter_ConvertDecimal_Exp-4            	   17992	     56777 ns/op	   11520 B/op	     432 allocs/op
BenchmarkDecimal_Decmial_String-4                     	 3443062	       361.0 ns/op	      88 B/op	       5 allocs/op
BenchmarkDecimal_DecmialStr_String-4                  	1000000000	         0.6694 ns/op	       0 B/op	       0 allocs/op
BenchmarkDecimal_Float64_String-4                     	 5254669	       260.7 ns/op	      48 B/op	       2 allocs/op
BenchmarkDecimal_Int64_String-4                       	13537401	        89.62 ns/op	      24 B/op	       1 allocs/op
BenchmarkDecimal_BigInt_String-4                      	 4664106	       247.4 ns/op	      56 B/op	       3 allocs/op
BenchmarkDecimal_BigIntStr_String-4                   	1000000000	         0.6873 ns/op	       0 B/op	       0 allocs/op

Additionally, if any issues arise, you can revert to the old implementation by modifying the _DefaultNumberConverter value in number.go.

The relationship between the types and their conversions is as follows:

from\totimebigIntdecimalbytesstringbool
time-Not supportedNot supportedSupports conversion of specified time formats (generally supports default time format)Supports conversion of specified time formats (generally supports default time format)Not supported
bigIntNot supported-SupportedSupportedSupportedConverts non-zero values to true, and zero to false
decimalNot supportedRounds to the nearest integer, truncating the decimal part-SupportedSupportedConverts non-zero values to true, and zero to false
bytesOnly supports conversion of specified time formats (generally supports default time format)Real numbers and scientific notation strings are roundedReal numbers and scientific notation strings-SupportedSupports conversion of "1", "t", "T", "TRUE", "true", "True" to true, and "0", "f", "F", "FALSE", "false", "False" to false
stringOnly supports conversion of specified time formats (generally supports default time format)Real numbers and scientific notation strings are roundedReal numbers and scientific notation stringsSupported-Supports conversion of "1", "t", "T", "TRUE", "true", "True" to "true", and "0", "f", "F", "FALSE", "false", "False" to false
boolNot supportedtrue converts to 1, false converts to 0true converts to 1.0, false converts to 0.0true converts to "true", false converts to "false"true converts to "true", false converts to "false"-

Note: The default time format is 2006-01-02 15:04:05.999999999Z07:00

This table provides an overview of data type conversions between different formats, including time, bigInt, decimal, bytes, string, and bool. It specifies which conversions are supported and which are not, as well as any specific behavior or limitations associated with each conversion.