Categorygithub.com/tradeoforigin/dataframe-go
modulepackage
0.1.0-alpha
Repository: https://github.com/tradeoforigin/dataframe-go.git
Documentation: pkg.go.dev

# README

GitHub go.mod Go version (branch & subdirectory of monorepo) GitHub all releases GitHub release (latest by date)

dataframe-go

Dataframes are used for statistics, machine-learning, and data manipulation/exploration. This package is based on rocketlaunchr/dataframe-go and rewritten with Go 1.18 generics. This package is still in progress and all of the rocketlaunchr/dataframe-go features will be added in the future. If you are interested in contributing, your help is welcome.

1. Installation and usage

go get -u github.com/tradeoforigin/dataframe-go
import "github.com/tradeoforigin/dataframe-go"

2. Series

Series is a generic struct to store any data you wish. Series is also a type of interface SeriesAny to handle different types in DataFrame.

s := dataframe.NewSeries("weight", nil, 115.5, 93.1)
fmt.Println(s.Table())

Output:

+-----+---------+
|     | WEIGHT  |
+-----+---------+
| 0:  |  115.5  |
| 1:  |  93.1   |
+-----+---------+
| 2X1 | FLOAT64 |
+-----+---------+

Series with type definition:

s := dataframe.NewSeries[float64]("weight", nil, 115, 93.1)
fmt.Println(s.Table())

Output:

+-----+---------+
|     | WEIGHT  |
+-----+---------+
| 0:  |   115   |
| 1:  |  93.1   |
+-----+---------+
| 2X1 | FLOAT64 |
+-----+---------+

You can also define series of your own type:

type Dog struct {
    name string
}
s := dataframe.NewSeries("dogs", nil, 
    Dog { "Abby" }, 
    Dog { "Agas" },
)
fmt.Println(s.Table())

Output:

+-----+----------+
|     |   DOGS   |
+-----+----------+
| 0:  |  {Abby}  |
| 1:  |  {Agas}  |
+-----+----------+
| 2X1 | MAIN DOG |
+-----+----------+

Or series of any type:

s := dataframe.NewSeries[any]("numbers", nil, 10, "ten", 10.0)
fmt.Println(s.Table())

Output:

+-----+---------+
|     | NUMBERS |
+-----+---------+
| 0:  |   10    |
| 1:  |   ten   |
| 2:  |   10    |
+-----+---------+
| 3X1 |   ANY   |
+-----+---------+

2.1. Series manipulation

Series provides a few functions for data manipulation:

  1. s.Value(row int, options ...Options) T returns the value of a particular row.
  2. s.Prepend(val []T, options ...Options) Prepend is used to set a value to the beginning of the series.
  3. s.Append(val []T, options ...Options) int is used to set a value to the end of the series.
  4. s.Insert(row int, val []T, options ...Options) Insert is used to set a value at an arbitrary row in the series. All existing values from that row onwards are shifted by 1.
  5. s.Remove(row int, options ...Options) is used to delete the value of a particular row.
  6. s.Reset(options ...Options) is used clear all data contained in the Series.
  7. s.Update(row int, val T, options ...Options) is used to update the value of a particular row.

Example:

s := dataframe.NewSeries[float64]("numbers", nil, 1, 2, 3) // [1, 2, 3]
s.Append([]float64 { 0, 0 }) // [1, 2, 3, 0, 0]
s.Prepend([] float64 { 0, 0 }) // [0, 0, 1, 2, 3, 0, 0]
s.Insert(2, []float64 { -1 }) // [0, 0, -1, 1, 2, 3, 0, 0]
s.Update(-1, -1) // [0, 0, -1, 1, 2, 3, 0, -1]
s.Remove(0) // [0, -1, 1, 2, 3, 0, -1]
fmt.Println(s.Table())

Output:

+-----+---------+
| 0:  |    0    |
| 1:  |   -1    |
| 2:  |    1    |
| 3:  |    2    |
| 4:  |    3    |
| 5:  |    0    |
| 6:  |   -1    |
+-----+---------+
| 7X1 | FLOAT64 |
+-----+---------+

2.2. Fill values randomly

There is possibility to fill series with random values:

s := dataframe.NewSeries("rand", nil, math.NaN(), math.NaN(), math.NaN())
s.FillRand(dataframe.RandFillerFloat64())

You can also define your own RandFiller as function of type dataframe.RandFn[T any].

2.3. Sorting

To sort series values you need to provide CompareFn[T any] as series less than function:

s := dataframe.NewSeries("sorted", nil, 0, 2, 1, 4, 3, 6, 5, 10, 9, 8, 7)
s.SetIsLessThanFunc(dataframe.IsLessThanFunc[int])
s.Sort(ctx) // DESC -> s.Sort(ctx, dataframe.SortOptions { Desc: true })
fmt.Println(s.Table())

Output:

+------+--------+
|      | SORTED |
+------+--------+
|  0:  |   0    |
|  1:  |   1    |
|  2:  |   2    |
|  3:  |   3    |
|  4:  |   4    |
|  5:  |   5    |
|  6:  |   6    |
|  7:  |   7    |
|  8:  |   8    |
|  9:  |   9    |
| 10:  |   10   |
+------+--------+
| 11X1 |  INT   |
+------+--------+

2.4. Values iterator

Values iterator is used to iterate series data. Iterator provides options to set:

  1. InitialRow - iterator starts at this row. It can be a negative value for indexing from the end of the series.
  2. Step - iteration steps. Can be negative value to iterate backwards.
  3. DontLock - if true is passed, then the series is not locked by the iterator.
s := dataframe.NewSeries("iterate", nil, 1, 2, 3)
iterator := s.Iterator()
for iterator.Next() {
    fmt.Println(iterator.Index, "->", iterator.Value)
}

Output:

0 -> 1
1 -> 2
2 -> 3

2.5. Apply and Filter

You can apply the function to modify the values of the series. Also, you can filter series data and DROP or KEEP values.

Apply:

s := dataframe.NewSeries("apply", nil, 1., 2., 3.) // *dataframe.Series[float64]
    
applyFn := func (val float64, row, nRows int) float64 {
    return val / 2
}
_, err := s.Apply(ctx, applyFn, dataframe.ApplyOptions { InPlace: true })
if err != nil {
    panic(err)
}
fmt.Println(s.Table())

Output:

+-----+---------+
|     |  APPLY  |
+-----+---------+
| 0:  |   0.5   |
| 1:  |    1    |
| 2:  |   1.5   |
+-----+---------+
| 3X1 | FLOAT64 |
+-----+---------+

Filter:

s := dataframe.NewSeries("filter", nil, 1., math.NaN(), 3.)
    
filterFn := func (val float64, row, nRows int) (dataframe.FilterAction, error) {
    if math.IsNaN(val) {
        return dataframe.DROP, nil
    }
    return dataframe.KEEP, nil
}
_, err := s.Filter(ctx, filterFn, dataframe.FilterOptions { InPlace: true })
if err != nil {
    panic(err)
}
fmt.Println(s.Table())

Output:

+-----+---------+
|     | FILTER  |
+-----+---------+
| 0:  |    1    |
| 1:  |    3    |
+-----+---------+
| 2X1 | FLOAT64 |
+-----+---------+

2.6. Copy and Equality

You can create a copy of the series as well as you can compare two different series.

s1 := dataframe.NewSeries[float64]("s1", nil, 1, 2, 3, 4)
s2 := s1.Copy() // copy series s1
eq, err := s.IsEqual(ctx, sc1) // returns true, nil 
// // lines below returns false, nil
// s2.Rename("s2")
// eq, err := s.IsEqual(ctx, sc1, dataframe.IsEqualOptions { CheckName: true }) 

3. DataFrame

DataFrame is a container for a Series of any kind. You can think of a Dataframe as an excel spreadsheet.

x := dataframe.NewSeries("x", nil, 1., 2., 3.)
y := dataframe.NewSeries("y", nil, 1., 2., 3.)
df := dataframe.NewDataFrame(x, y)
fmt.Println(df.Table())

Output:

+-----+---------+---------+
|     |    X    |    Y    |
+-----+---------+---------+
| 0:  |    1    |    1    |
| 1:  |    2    |    2    |
| 2:  |    3    |    3    |
+-----+---------+---------+
| 3X2 | FLOAT64 | FLOAT64 |
+-----+---------+---------+

3.1. DataFrame manipulation

DataFrame provides functions for manipulation with data. Similarly like for the series:

  1. df.Row(row int, options ...Options) map[string]any returns the series' values for a particular row.
  2. df.Prepend(vals any, options ...Options) inserts a row at the beginning.
  3. df.Append(vals any, options ...Options) inserts a row at the end.
  4. df.Insert(row int, vals any, options ...Options) adds a row to a particular position.
  5. df.Remove(row int, options ...Options) deletes a row.
  6. df.UpdateRow(row int, vals any, options ...Options) will update an entire row.
  7. df.Update(row int, col any, val any, options ...Options) is used to update a specific entry. col can be the name of the series or the column number.
  8. df.ReorderColumns(newOrder []string, options ...Options) error ReorderColumns reorders the columns based on an ordered list of column names. The length of newOrder must match the number of columns in the Dataframe. The column names in newOrder must be unique.
  9. df.RemoveSeries(seriesName string, options ...Options) error will remove a Series from the Dataframe.
  10. df.AddSeries(s SeriesAny, colN *int, options ...Options) error will add a Series to the end of the DataFrame, unless set by ColN.
  11. df.Swap(row1, row2 int, options ...Options) is used to swap 2 values based on their row position.

In many cases the values should be provided as map[string]any, map[int]any or []any.

s1 := dataframe.NewSeries[float64]("a", nil, 1, 2, 3, 4)
s2 := dataframe.NewSeries[float64]("b", nil, 1, 2, 3, 4)
df := dataframe.NewDataFrame(s1, s2)
df.Append(map[string]any {
    "a": [] float64 { 0, 0 },
    "b": [] float64 { 0, 0 },
})
df.Prepend(map[string]any {
    "a": [] float64 { 0, 0 },
    "b": [] float64 { 0, 0 },
})
df.Insert(2, map[string]any {
    "a": -1.0,
    "b": -1.0,
})
df.Update(-1, "a", -1.0)
fmt.Println(df.Table())

Output:

+-----+---------+---------+
|     |    A    |    B    |
+-----+---------+---------+
| 0:  |    0    |    0    |
| 1:  |    0    |    0    |
| 2:  |   -1    |   -1    |
| 3:  |    1    |    1    |
| 4:  |    2    |    2    |
| 5:  |    3    |    3    |
| 6:  |    4    |    4    |
| 7:  |    0    |    0    |
| 8:  |   -1    |    0    |
+-----+---------+---------+
| 9X2 | FLOAT64 | FLOAT64 |
+-----+---------+---------+

3.2. Fill values randomly

You can fill values with RandFiller at once:

s1 := dataframe.NewSeries("a", nil, math.NaN(), math.NaN(), math.NaN())
s2 := dataframe.NewSeries("b", nil, math.NaN(), math.NaN(), math.NaN())
df := dataframe.NewDataFrame(s1, s2)
df.FillRand(func() any {
    return rand.Float64()
})

3.3. Sorting

To sort DataFrame you need to provide CompareFn[T any] for all of the series as an input to the function IsLessThanFunc():

s1 := dataframe.NewSeries("a", nil, 0, 2, 1, 4, 3, 6, 5, 10, 9, 8, 7)
s2 := dataframe.NewSeries("b", nil, 0, 2, 1, 4, 3, 6, 5, 10, 9, 8, 7)
s1.SetIsLessThanFunc(dataframe.IsLessThanFunc[int])
s2.SetIsLessThanFunc(dataframe.IsLessThanFunc[int])
df := dataframe.NewDataFrame(s1, s2)
    
df.Sort(ctx, []dataframe.SortKey {
    { Key: "a" }, // Desc: true
    { Key: "b" }, // Desc: true
})
fmt.Println(df.Table())

Output:

+------+-----+-----+
|      |  A  |  B  |
+------+-----+-----+
|  0:  |  0  |  0  |
|  1:  |  1  |  1  |
|  2:  |  2  |  2  |
|  3:  |  3  |  3  |
|  4:  |  4  |  4  |
|  5:  |  5  |  5  |
|  6:  |  6  |  6  |
|  7:  |  7  |  7  |
|  8:  |  8  |  8  |
|  9:  |  9  |  9  |
| 10:  | 10  | 10  |
+------+-----+-----+
| 11X2 | INT | INT |
+------+-----+-----+

3.4. Values iterator

Values iterator is used to iterate dataframe rows. Iterator provides options to set:

  1. InitialRow - iterator starts at this row. It can be a negative value for indexing from the end of the series.
  2. Step - iteration steps. It can be a negative value to iterate backwards.
  3. DontLock - if true is passed, then the dataframe is not locked by the iterator.
s1 := dataframe.NewSeries("a", nil, 1, 2, 3)
s2 := dataframe.NewSeries("b", nil, 1, 2, 3)
df := dataframe.NewDataFrame(s1, s2)
var iterator = df.Iterator()
for iterator.Next() {
    fmt.Println(iterator.Index, iterator.Value)
}

Output:

0 map[a:1 b:1]
1 map[a:2 b:2]
2 map[a:3 b:3]

3.5. Apply and Filter

You can apply the function to modify rows of the dataframe. Also, you can filter data of the dataframe and DROP or KEEP values.

Apply:

y1  := dataframe.NewSeries[float64]("y1", &dataframe.SeriesInit{Size: 24})
y2 := dataframe.NewSeries[float64]("y2", &dataframe.SeriesInit{Size: 24})
    
df := dataframe.NewDataFrame(y1, y2)

fn := func (vals map[string]any, row, nRows int) map[string]any {
    x := float64(row + 1)
    y := math.Sin(2 * math.Pi * x / 24)
    if y == 1 || y == -1 {
        return map[string]any{
            "y1": y,
            "y2": y,
        }
    }
    // We can also update just one column
    return map[string]any{
        "y1": y,
    }
}
_, err := df.Apply(ctx, fn, dataframe.ApplyOptions { InPlace: true })
if err != nil {
    panic(err)
}
fmt.Println(df.Table())

Output:

+------+------------------------+---------+
|      |           Y1           |   Y2    |
+------+------------------------+---------+
|  0:  |  0.25881904510252074   |   NaN   |
|  1:  |  0.49999999999999994   |   NaN   |
|  2:  |   0.7071067811865475   |   NaN   |
|  3:  |   0.8660254037844386   |   NaN   |
|  4:  |   0.9659258262890683   |   NaN   |
|  5:  |           1            |    1    |
|  6:  |   0.9659258262890683   |   NaN   |
|  7:  |   0.8660254037844388   |   NaN   |
|  8:  |   0.7071067811865476   |   NaN   |
|  9:  |  0.49999999999999994   |   NaN   |
| 10:  |   0.258819045102521    |   NaN   |
| 11:  | 1.2246467991473515e-16 |   NaN   |
| 12:  |  -0.2588190451025208   |   NaN   |
| 13:  |  -0.4999999999999998   |   NaN   |
| 14:  |  -0.7071067811865471   |   NaN   |
| 15:  |  -0.8660254037844384   |   NaN   |
| 16:  |  -0.9659258262890683   |   NaN   |
| 17:  |           -1           |   -1    |
| 18:  |  -0.9659258262890684   |   NaN   |
| 19:  |  -0.8660254037844386   |   NaN   |
| 20:  |  -0.7071067811865477   |   NaN   |
| 21:  |  -0.5000000000000004   |   NaN   |
| 22:  |  -0.2588190451025215   |   NaN   |
| 23:  | -2.449293598294703e-16 |   NaN   |
+------+------------------------+---------+
| 24X2 |        FLOAT64         | FLOAT64 |
+------+------------------------+---------+

Filter:

s := dataframe.NewSeries("s", nil, 1, 2, 3, 4, 5)
df := dataframe.NewDataFrame(s)
    
fn := func (vals map[string]any, row, nRows int) (dataframe.FilterAction, error) {
    if row % 2 != 0 {
        return dataframe.DROP, nil
    }
    return dataframe.KEEP, nil
}
_, err := df.Filter(ctx, fn, dataframe.FilterOptions { InPlace: true })
if err != nil {
    panic(err)
}
fmt.Println(df.Table())

Output:

+-----+-----+
|     |  S  |
+-----+-----+
| 0:  |  1  |
| 1:  |  3  |
| 2:  |  5  |
+-----+-----+
| 3X1 | INT |
+-----+-----+

3.6. Copy and Equality

You can create a copy of the dataframe and compare two different dataframes.

s := dataframe.NewSeries[float64]("s", nil, 1, 2, 3, 4)
df1 := dataframe.NewDataFrame(s)
df2 := df1.Copy() // To copy series s1
eq, err := df1.IsEqual(ctx, df2) // returns true, nil 

3.7. Import dataframe from CSV

There is possibility to import dataframe directly from CSV:

csvString := `
A,B,C,D
0.0,0.0,0.02,0
0.0,1.6739,0.04,0
0.0,1.6739,0.06,0
0.0,1.673738,0.06,0
0.0,1.6736,0.06,0
0.0,1.673456,0.08,0
0.0,1.67302752,0.08,0
0.0,1.6726333184,0.08,0
1.6681,0.0,0.02,1`
reader := strings.NewReader(csvString)
df, err := csv.Load(ctx, reader, map[string]csv.ConverterAny {
    "A": csv.Float64,
    "B": csv.Float64,
    "C": csv.Float64,
    "D": csv.Float64,
})
if err != nil {
    t.Fatal(err)
}
fmt.Println(df.Table())

Output:

+-----+---------+---------+--------------+---------+
|     |    D    |    A    |      B       |    C    |
+-----+---------+---------+--------------+---------+
| 0:  |    0    |    0    |      0       |  0.02   |
| 1:  |    0    |    0    |    1.6739    |  0.04   |
| 2:  |    0    |    0    |    1.6739    |  0.06   |
| 3:  |    0    |    0    |   1.673738   |  0.06   |
| 4:  |    0    |    0    |    1.6736    |  0.06   |
| 5:  |    0    |    0    |   1.673456   |  0.08   |
| 6:  |    0    |    0    |  1.67302752  |  0.08   |
| 7:  |    0    |    0    | 1.6726333184 |  0.08   |
| 8:  |    1    | 1.6681  |      0       |  0.02   |
+-----+---------+---------+--------------+---------+
| 9X4 | FLOAT64 | FLOAT64 |   FLOAT64    | FLOAT64 |
+-----+---------+---------+--------------+---------+

You can also define custom converter to fit your needs.

For export dataframe to CSV you can use:

s1 := dataframe.NewSeries("str", nil, "one", "one,two", "one,two,three")
s2 := dataframe.NewSeries("num", nil, 1, 12, 123)
df := dataframe.NewDataFrame(s1, s2)
f, err := os.OpenFile("data/export.csv", os.O_WRONLY|os.O_CREATE, 0600)
if err != nil {
    panic(err)
}
err = csv.Export(ctx, f, df)
if err != nil {
    panic(err)
}
f.Close()

3.8. Math functions and fakers

There is no need for creating series by string expressions. Math functions for series can be covered by df.Apply or s.Apply function. The faker can be covered by custom RandFillers. Math functions and fakers may be added in future.

# Packages

No description provided by the author
No description provided by the author
No description provided by the author

# Functions

ApplyDataFrame applies function to DataFrame.
ApplySeries applies filter function to series.
DefaultOptions is helper function to resolve variadic options.
DefaultValueFormatter will return a string representation of the data in a particular row.
FilterDataFrame applies filter function to DataFrame.
FilterSeries applies filter function to series.
GetSeries helps get series of `DataFrame` as a series of concrete type.
IsEqualDefaultFunc provides comparaision for any type.
IsEqualFunc provides basic comparison for comparable types.
IsEqualPtrFunc provides comparision for pointers of comparable types.
IsLessThanFunc provides (less than) comparision for Ordered types.
IsLessThanPtrFunc provides (less than) comparision for pointers of Ordered types.
NewDataFrame creates a dataframe from passed series.
NewIterator creates Iterator instance with iterator function of type `IteratorFn[T any]`.
NewSeries creates a series of type T with defined name.
RandFillerFloat64 is helper function to fill data of *Series[float64] randomly.
Range is helper function for creating RangeOptions.

# Constants

CHOOSE is used to signify that a row must be kept.
DROP is used to signify that a row must be dropped.
KEEP is used to signify that a row must be kept.

# Variables

shortcut for Options { DontLock: true }.

# Structs

DataFrame allows you to handle numerous series of data conveniently.
FilterOptions is defined as an optional parameters for Filter(...) on top of Series or DataFrame.
IsEqualOptions is defined as an optional parameters for IsEqual(...) on top of Series or DataFrame.
Iterator is an structure for iterating Series or DataFrames.
SortOptions is defined as an optional parameters for Iterator(...) on top of Series or DataFrame.
Options is used to perform operation with DontLock.
RangeOptions is defined as an optional parameters for functions which needs range like Copy(...), Apply(...), Filter(...), etc.
No description provided by the author
SeriesInit is used to configure the series when it is initialized.
SortKey is the key to sort a Dataframe.
SortOptions is defined as an optional parameters for Sort(...) on top of Series or DataFrame.
TableOptions is defined as an optional parameters for Table(...) on top of Series or DataFrame.

# Interfaces

No description provided by the author

# Type aliases

ApplyDataFrameFn is used by the Apply function when used with DataFrames.
ApplyOptions is defined as an optional parameters for Apply(...) on top of Series or DataFrame.
ApplySeriesFn is used by the Apply function when used with Series.
CompareFn type for compare function for comparision values of same type.
FilterAction is the return value of FilterSeriesFn and FilterDataFrameFn.
FilterDataFrameFn is used by the Filter function to determine which rows are selected.
FilterSeriesFn is used by the Filter function to determine which rows are selected.
Iterator function returns actual row, value for that row, total number of elements and "not done" flag.
No description provided by the author
ValueToStringFormatter is used to convert a value into a string.