# README
datatable
datatable is a Go package to manipulate tabular data, like an excel spreadsheet. datatable is inspired by the pandas python package and the data.frame R structure. Although it's production ready, be aware that we're still working on API improvements
Installation
go get github.com/datasweet/datatable
Features
- Create custom Series (ie custom columns). Currently available, serie.Int, serie.String, serie.Time, serie.Float64.
- Apply expressions
- Selects (head, tail, subset)
- Sorting
- InnerJoin, LeftJoin, RightJoin, OuterJoin, Concats
- Aggregate
- Import from CSV
- Export to map, slice
Creating a DataTable
package main
import (
"fmt"
"github.com/datasweet/datatable"
)
func main() {
dt := datatable.New("test")
dt.AddColumn("champ", datatable.String, datatable.Values("Malzahar", "Xerath", "Teemo"))
dt.AddColumn("champion", datatable.String, datatable.Expr("upper(`champ`)"))
dt.AddColumn("win", datatable.Int, datatable.Values(10, 20, 666))
dt.AddColumn("loose", datatable.Int, datatable.Values(6, 5, 666))
dt.AddColumn("winRate", datatable.Float64, datatable.Expr("`win` * 100 / (`win` + `loose`)"))
dt.AddColumn("winRate %", datatable.String, datatable.Expr(" `winRate` ~ \" %\""))
dt.AddColumn("sum", datatable.Float64, datatable.Expr("sum(`win`)"))
fmt.Println(dt)
}
/*
CHAMP <NULLSTRING> CHAMPION <NULLSTRING> WIN <NULLINT> LOOSE <NULLINT> WINRATE <NULLFLOAT64> WINRATE % <NULLSTRING> SUM <NULLFLOAT64>
Malzahar MALZAHAR 10 6 62.5 62.5 % 696
Xerath XERATH 20 5 80 80 % 696
Teemo TEEMO 666 666 50 50 % 696
*/
Reading a CSV and aggregate
package main
import (
"fmt"
"log"
"os"
"time"
"github.com/datasweet/datatable"
"github.com/datasweet/datatable/import/csv"
)
func main() {
dt, err := csv.Import("csv", "phone_data.csv",
csv.HasHeader(true),
csv.AcceptDate("02/01/06 15:04"),
csv.AcceptDate("2006-01"),
)
if err != nil {
log.Fatalf("reading csv: %v", err)
}
dt.Print(os.Stdout, datatable.PrintMaxRows(24))
dt2, err := dt.Aggregate(datatable.AggregateBy{Type: datatable.Count, Field: "index"})
if err != nil {
log.Fatalf("aggregate COUNT('index'): %v", err)
}
fmt.Println(dt2)
groups, err := dt.GroupBy(datatable.GroupBy{
Name: "year",
Type: datatable.Int,
Keyer: func(row datatable.Row) (interface{}, bool) {
if d, ok := row["date"]; ok {
if tm, ok := d.(time.Time); ok {
return tm.Year(), true
}
}
return nil, false
},
})
if err != nil {
log.Fatalf("GROUP BY 'year': %v", err)
}
dt3, err := groups.Aggregate(
datatable.AggregateBy{Type: datatable.Sum, Field: "duration"},
datatable.AggregateBy{Type: datatable.CountDistinct, Field: "network"},
)
if err != nil {
log.Fatalf("Aggregate SUM('duration'), COUNT_DISTINCT('network') GROUP BY 'year': %v", err)
}
fmt.Println(dt3)
}
Creating a custom serie
To create a custom serie you must provide:
- a caster function, to cast a generic value to your serie value. The signature must be func(i interface{}) T
- a comparator, to compare your serie value. The signature must be func(a, b T) int
Example with a NullInt
// IntN is an alis to create the custom Serie to manage IntN
func IntN(v ...interface{}) Serie {
s, _ := New(NullInt{}, asNullInt, compareNullInt)
if len(v) > 0 {
s.Append(v...)
}
return s
}
type NullInt struct {
Int int
Valid bool
}
// Interface() to render the current struct as a value.
// If not provided, the serie.All() or serie.Get() wills returns the embedded value
// IE: NullInt{}
func (i NullInt) Interface() interface{} {
if i.Valid {
return i.Int
}
return nil
}
// asNullInt is our caster function
func asNullInt(i interface{}) NullInt {
var ni NullInt
if i == nil {
return ni
}
if v, ok := i.(NullInt); ok {
return v
}
if v, err := cast.ToIntE(i); err == nil {
ni.Int = v
ni.Valid = true
}
return ni
}
// compareNullInt is our comparator function
// used to sort
func compareNullInt(a, b NullInt) int {
if !b.Valid {
if !a.Valid {
return Eq
}
return Gt
}
if !a.Valid {
return Lt
}
if a.Int == b.Int {
return Eq
}
if a.Int < b.Int {
return Lt
}
return Gt
}
Who are we ?
We are Datasweet, a french startup providing full service (big) data solutions.
Questions ? problems ? suggestions ?
If you find a bug or want to request a feature, please create a GitHub Issue.
Contributors
Cléo Rebert |
License
This software is licensed under the Apache License, version 2 ("ALv2"), quoted below.
Copyright 2017-2020 Datasweet <http://www.datasweet.fr>
Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.
# Functions
ColumnHidden sets the visibility.
ColumnTypes to list all column type.
Concat datatables.
ExportHidden to show a column when exporting (default false).
Expr sets the expr for the column <!> Incompatible with ColumnValues.
InnerJoin selects records that have matching values in both tables.
LeftJoin the tables.
New creates a new datatable.
On creates a "join on" expression ie, as SQL, SELECT * FROM A INNER JOIN B ON B.id = A.user_id Syntax: "[table].[field]", "field".
OuterJoin the tables.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
RegisterColumnType to extends the known type.
RightJoin the tables.
TimeFormats sets the valid time formats.
Using creates a "join using" expression ie, as SQL, SELECT * FROM A INNER JOIN B USING 'field'.
Values fills the column with the values <!> Incompatible with ColumnExpr.
# Constants
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
Uint ColumnType = "uint" Uint8 ColumnType = "uint8" Uint16 ColumnType = "uint16" Uint32 ColumnType = "uint32" Uint64 ColumnType = "uint64".
No description provided by the author
No description provided by the author
Int8 ColumnType = "int8" Int16 ColumnType = "int16".
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
# Variables
Errors in import/csv.
Errors in aggregate.go.
Errors in import/csv.
Errors in mutate_column.go.
Errors in aggregate.go.
Errors in mutate_column.go.
Errors in column.go.
Errors in eval_expr.
Errors in mutate_column.go.
Errors in mutate_rows.go.
Errors in mutate_column.go.
Errors in mutate_column.go.
Errors in mutate_column.go.
Errors in import/csv.
Errors in aggregate.go.
Errors in column.go.
Errors in join.go.
Errors in mutate_column.go.
Errors in join.go.
Errors in aggregate.go.
Errors in aggregate.go.
Errors in join.go.
Errors in join.go.
Errors in concat.go.
Errors in join.go.
Errors in join.go.
Errors in import/csv.
Errors in import/csv.
Errors in column.go.
Errors in aggregate.go.
Errors in column.go.
Errors in join.go.
Errors in mutate_rows.go.
Errors in import/csv.
# Structs
AggregateBy defines the aggregation.
ColumnOptions describes options to be apply on a column.
DataTable is our main struct.
ExportOptions to add options for exporting (like showing hidden columns).
GroupBy defines the group by configuration Name is the name of the output column Type is the type of the output column Keyer is our main function to aggregate.
Groups.
No description provided by the author
PrintOptions to control the printer.
Schema describes a datatable.
No description provided by the author
SortBy defines a sort to be applied.
# Interfaces
Column describes a column in our datatable.
# Type aliases
AggregationType defines the avalaible aggregation.
ColumnOption sets column options.
ColumnSerier to create a serie from column options.
ColumnType defines the valid column type in datatable.
No description provided by the author
No description provided by the author
Row contains a row relative to columns.