Categorygithub.com/datasweet/datatable
modulepackage
0.3.2
Repository: https://github.com/datasweet/datatable.git
Documentation: pkg.go.dev

# README

datatable

Circle CI Go Report Card GoDoc GitHub stars GitHub license

datasweet-logo

datatable is a Go package to manipulate tabular data, like an excel spreadsheet. datatable is inspired by the pandas python package and the data.frame R structure. Although it's production ready, be aware that we're still working on API improvements

Installation

go get github.com/datasweet/datatable

Features

  • Create custom Series (ie custom columns). Currently available, serie.Int, serie.String, serie.Time, serie.Float64.
  • Apply expressions
  • Selects (head, tail, subset)
  • Sorting
  • InnerJoin, LeftJoin, RightJoin, OuterJoin, Concats
  • Aggregate
  • Import from CSV
  • Export to map, slice

Creating a DataTable

package main

import (
	"fmt"

	"github.com/datasweet/datatable"
)

func main() {
	dt := datatable.New("test")
	dt.AddColumn("champ", datatable.String, datatable.Values("Malzahar", "Xerath", "Teemo"))
	dt.AddColumn("champion", datatable.String, datatable.Expr("upper(`champ`)"))
	dt.AddColumn("win", datatable.Int, datatable.Values(10, 20, 666))
	dt.AddColumn("loose", datatable.Int, datatable.Values(6, 5, 666))
	dt.AddColumn("winRate", datatable.Float64, datatable.Expr("`win` * 100 / (`win` + `loose`)"))
	dt.AddColumn("winRate %", datatable.String, datatable.Expr(" `winRate` ~ \" %\""))
	dt.AddColumn("sum", datatable.Float64, datatable.Expr("sum(`win`)"))

	fmt.Println(dt)
}

/*
CHAMP <NULLSTRING>      CHAMPION <NULLSTRING>   WIN <NULLINT>   LOOSE <NULLINT> WINRATE <NULLFLOAT64>   WINRATE % <NULLSTRING>  SUM <NULLFLOAT64> 
Malzahar                MALZAHAR                10              6               62.5                    62.5 %                  696              
Xerath                  XERATH                  20              5               80                      80 %                    696              
Teemo                   TEEMO                   666             666             50                      50 %                    696    
*/

Reading a CSV and aggregate

package main

import (
	"fmt"
	"log"
	"os"
	"time"

	"github.com/datasweet/datatable"
	"github.com/datasweet/datatable/import/csv"
)

func main() {
	dt, err := csv.Import("csv", "phone_data.csv",
		csv.HasHeader(true),
		csv.AcceptDate("02/01/06 15:04"),
		csv.AcceptDate("2006-01"),
	)
	if err != nil {
		log.Fatalf("reading csv: %v", err)
	}

	dt.Print(os.Stdout, datatable.PrintMaxRows(24))

	dt2, err := dt.Aggregate(datatable.AggregateBy{Type: datatable.Count, Field: "index"})
	if err != nil {
		log.Fatalf("aggregate COUNT('index'): %v", err)
	}
	fmt.Println(dt2)

	groups, err := dt.GroupBy(datatable.GroupBy{
		Name: "year",
		Type: datatable.Int,
		Keyer: func(row datatable.Row) (interface{}, bool) {
			if d, ok := row["date"]; ok {
				if tm, ok := d.(time.Time); ok {
					return tm.Year(), true
				}
			}
			return nil, false
		},
	})
	if err != nil {
		log.Fatalf("GROUP BY 'year': %v", err)
	}
	dt3, err := groups.Aggregate(
		datatable.AggregateBy{Type: datatable.Sum, Field: "duration"},
		datatable.AggregateBy{Type: datatable.CountDistinct, Field: "network"},
	)
	if err != nil {
		log.Fatalf("Aggregate SUM('duration'), COUNT_DISTINCT('network') GROUP BY 'year': %v", err)
	}
	fmt.Println(dt3)
}

Creating a custom serie

To create a custom serie you must provide:

  • a caster function, to cast a generic value to your serie value. The signature must be func(i interface{}) T
  • a comparator, to compare your serie value. The signature must be func(a, b T) int

Example with a NullInt

// IntN is an alis to create the custom Serie to manage IntN
func IntN(v ...interface{}) Serie {
	s, _ := New(NullInt{}, asNullInt, compareNullInt)
	if len(v) > 0 {
		s.Append(v...)
	}
	return s
}

type NullInt struct {
	Int   int
	Valid bool
}

// Interface() to render the current struct as a value.
// If not provided, the serie.All() or serie.Get() wills returns the embedded value
// IE: NullInt{}
func (i NullInt) Interface() interface{} {
	if i.Valid {
		return i.Int
	}
	return nil
}

// asNullInt is our caster function
func asNullInt(i interface{}) NullInt {
	var ni NullInt
	if i == nil {
		return ni
	}

	if v, ok := i.(NullInt); ok {
		return v
	}

	if v, err := cast.ToIntE(i); err == nil {
		ni.Int = v
		ni.Valid = true
	}
	return ni
}

// compareNullInt is our comparator function
// used to sort
func compareNullInt(a, b NullInt) int {
	if !b.Valid {
		if !a.Valid {
			return Eq
		}
		return Gt
	}
	if !a.Valid {
		return Lt
  }
  if a.Int == b.Int {
		return Eq
	}
	if a.Int < b.Int {
		return Lt
	}
	return Gt
}

Who are we ?

We are Datasweet, a french startup providing full service (big) data solutions.

Questions ? problems ? suggestions ?

If you find a bug or want to request a feature, please create a GitHub Issue.

Contributors


Cléo Rebert

License

This software is licensed under the Apache License, version 2 ("ALv2"), quoted below.

Copyright 2017-2020 Datasweet <http://www.datasweet.fr>

Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.

# Packages

No description provided by the author
No description provided by the author
No description provided by the author

# Functions

ColumnHidden sets the visibility.
ColumnTypes to list all column type.
Concat datatables.
ExportHidden to show a column when exporting (default false).
Expr sets the expr for the column <!> Incompatible with ColumnValues.
InnerJoin selects records that have matching values in both tables.
LeftJoin the tables.
New creates a new datatable.
On creates a "join on" expression ie, as SQL, SELECT * FROM A INNER JOIN B ON B.id = A.user_id Syntax: "[table].[field]", "field".
OuterJoin the tables.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
RegisterColumnType to extends the known type.
RightJoin the tables.
TimeFormats sets the valid time formats.
Using creates a "join using" expression ie, as SQL, SELECT * FROM A INNER JOIN B USING 'field'.
Values fills the column with the values <!> Incompatible with ColumnExpr.

# Constants

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
Uint ColumnType = "uint" Uint8 ColumnType = "uint8" Uint16 ColumnType = "uint16" Uint32 ColumnType = "uint32" Uint64 ColumnType = "uint64".
No description provided by the author
No description provided by the author
Int8 ColumnType = "int8" Int16 ColumnType = "int16".
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author

# Variables

Errors in import/csv.
Errors in aggregate.go.
Errors in import/csv.
Errors in mutate_column.go.
Errors in aggregate.go.
Errors in mutate_column.go.
Errors in column.go.
Errors in eval_expr.
Errors in mutate_column.go.
Errors in mutate_rows.go.
Errors in mutate_column.go.
Errors in mutate_column.go.
Errors in mutate_column.go.
Errors in import/csv.
Errors in aggregate.go.
Errors in column.go.
Errors in join.go.
Errors in mutate_column.go.
Errors in join.go.
Errors in aggregate.go.
Errors in aggregate.go.
Errors in join.go.
Errors in join.go.
Errors in concat.go.
Errors in join.go.
Errors in join.go.
Errors in import/csv.
Errors in import/csv.
Errors in column.go.
Errors in aggregate.go.
Errors in column.go.
Errors in join.go.
Errors in mutate_rows.go.
Errors in import/csv.

# Structs

AggregateBy defines the aggregation.
ColumnOptions describes options to be apply on a column.
DataTable is our main struct.
ExportOptions to add options for exporting (like showing hidden columns).
GroupBy defines the group by configuration Name is the name of the output column Type is the type of the output column Keyer is our main function to aggregate.
Groups.
No description provided by the author
PrintOptions to control the printer.
Schema describes a datatable.
No description provided by the author
SortBy defines a sort to be applied.

# Interfaces

Column describes a column in our datatable.

# Type aliases

AggregationType defines the avalaible aggregation.
ColumnOption sets column options.
ColumnSerier to create a serie from column options.
ColumnType defines the valid column type in datatable.
No description provided by the author
No description provided by the author
Row contains a row relative to columns.