Categorygithub.com/lanrat/extsort
modulepackage
1.0.2
Repository: https://github.com/lanrat/extsort.git
Documentation: pkg.go.dev

# README

extsort

PkgGoDev

Go Report Card

An external sorting library for golang (i.e. on disk sorting) on an arbitrarily channel, even if the generated content doesn't all fit in memory at once. Once sorted, it returns a new channel returning data in sorted order.

In order to remain efficient for all implementations, extsort doesn't handle serialization, but leaves that to the user by operating on types that implement the SortType.ToBytes and FromBytes interfaces.

extsort also has a Strings() interface which does not require the overhead of converting everything to/from bytes and is faster for string types.

extsort is not a stable sort.

Example

package main

import (
    "encoding/binary"
    "fmt"
    "math/rand"

    "github.com/lanrat/extsort"
)

var count = int(1e7) // 10M

type sortInt struct {
    i int64
}

func (s sortInt) ToBytes() []byte {
    buf := make([]byte, binary.MaxVarintLen64)
    binary.PutVarint(buf, s.i)
    return buf
}

func sortIntFromBytes(b []byte) extsort.SortType {
    i, _ := binary.Varint(b)
    return sortInt{i: i}
}

func compareSortIntLess(a, b extsort.SortType) bool {
    return a.(sortInt).i < b.(sortInt).i
}

func main() {
    // create an input channel with unsorted data
    inputChan := make(chan extsort.SortType)
    go func() {
        for i := 0; i < count; i++ {
            inputChan <- sortInt{i: rand.Int63()}
        }
        close(inputChan)
    }()

    // create the sorter and start sorting
    sorter, outputChan, errChan := extsort.New(inputChan, sortIntFromBytes, compareSortIntLess, nil)
    sorter.Sort(context.Background())

    // print output sorted data
    for data := range outputChan {
        fmt.Printf("%d\n", data.(sortInt).i)
    }
    if err := <-errChan; err != nil {
        fmt.Printf("err: %s", err.Error())
    }
}

extsort/diff

The diff sub-package is a self-contained package that assists with diffing of channels of sorted data. It can be used with the extsort methods or on its own

TODO

  • parallelize merging after sorting

# Packages

Package diff performs diffs on sorted channels of data.
Package queue provided a generic priority queue implantation based on the internal heap.
Package tempfile implements a virtual temp files that can be written to (in series) and then read back (series/parallel) and then removed from the filesystem when done if multiple "tempfiles" are needed on the application layer, they are mapped to sections of the same real file on the filesystem.

# Functions

DefaultConfig returns the default configuration options sued if none provided.
New returns a new Sorter instance that can be used to sort the input chan fromBytes is needed to unmarshal SortTypes from []byte on disk lessfunc is the comparator used for SortType config ca be nil to use the defaults, or only set the non-default values desired if errors or interrupted, may leave temp files behind in config.TempFilesDir the returned channels contain the data returned from calling Sort().
NewMock is the same as New() but is backed by memory instead of a temporary file on disk n is the size to initialize the backing bytes buffer too.
Strings returns a new Sorter instance that can be used to sort the input chan.
StringsMock is the same as Strings() but is backed by memory instead of a temporary file on disk n is the size to initialize the backing bytes buffer too.
UniqStringChan returns a channel identical to the input but only with uniq elements only works on sorted inputs.

# Structs

Config holds configuration settings for extsort.
SortTypeSorter stores an input chan and feeds Sort to return a sorted chan.
StringSorter stores an input chan and feeds Sort to return a sorted chan.

# Interfaces

Sorter is the interface that all extsort sorters must satisfy.
SortType defines the interface required by the extsort library to be able to sort the items.

# Type aliases

CompareLessFunc compares two SortType items and returns true if a is less than b.
FromBytes unmarshal bytes from gob to create a SortType when reading back the sorted items.