Categorygithub.com/lezhnev74/SortedArray
modulepackage
0.0.0-20230701125740-e5f8025ee39a
Repository: https://github.com/lezhnev74/sortedarray.git
Documentation: pkg.go.dev

# README

Sorted Array Lib

This package allows to work with huge sorted arrays efficiently by splitting them in chunks under the hood. The basic idea here is simple - avoid loading all the numbers into memory and only work with smaller parts of the original array (chunking).

Motivation for this package is to be used for building inverted indexes. The main requirement for such use-case is being memory and space-efficient. I chose uint32 as the element type to support indexing of unix timestamps.

Usage Sample

storage := NewInMemoryChunkStorage()
maxChunkSize := 1000
arr := NewSortedArray(maxChunkSize, storage)

arr.Add([]uint32{10, 20, 30, 40, 50})
arr.Delete([]uint32{10, 30, 50})
arr.Flush() // write to the storage

require.EqualValues(t, []uint32{20, 40}, arr.ToSlice())

Storage

To store chunks one needs to implement this interface:

type ChunkStorage interface {
Read(chunkIds []uint32) (map[uint32]*Chunk, error)
Save(chunks map[uint32]*Chunk) error
Remove(chunkIds []uint32) error

ReadMeta() ([]*ChunkMeta, error)
SaveMeta([]*ChunkMeta) error
}

Two implementations are present in this package:

  • in-memory (used for testing purposes)
  • sqlite

Transactions are not assumed by this package. I kept in mind one particular use-case: sqlite as a storage. One would start an SQLite transaction and within this transaction load this index and work with numbers. SQLite would handle concurrent access. Within Sqlite I would use blobs to keep this array's chunks.

Compression

A sorted array is perfect for compression. It looks like the best algorithms are designed by Lemire: https://lemire.me/blog/2012/09/12/fast-integer-compression-decoding-billions-of-integers-per-second/

Some GO implementations are:

Concurrent Access And Garbage Collector

This lib loads chunks into memory. Chunks are determined based on meta description and input given to Add/Delete/GetInRange. Loaded chunks are freed (for further GC) upon these events:

  • Flush() called
  • GetInRage() loads and frees one chunk at a time while iterating

# Functions

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author

# Structs

Chunk represents an asc sorted array of numbers, grouped together for faster processing the type chosen to be uint32 to contain unix timestamps in seconds enough for general index purposes when there are not many events per second.
ChunkMeta is a light description of a chunk used to select relevant chunks before fetching them.
No description provided by the author
Meta contains a list of SORTED chunks descriptions No overlapping allowed.
SortedArray manages ASC sorted array in chunks for better performance Chunks contain up to maxInsertSize items and may not intersect with each other.
SortedArraySqlTxStorage implemented sorted array storage for sqlite it uses blobs to store chunks and meta key is used to produce unique ids for the blobs in a shared table.

# Interfaces

ChunkStorage does simple CRUD operations on persistent storage Serialization(+compression) must be implemented at this level.