Categorygithub.com/kklash/simpledb
repositorypackage
0.0.0-20221031164116-e440536827b2
Repository: https://github.com/kklash/simpledb.git
Documentation: pkg.go.dev

# README

SimpleDB

SimpleDB is a very basic No-SQL database format for long-term data storage in Golang. It is WIP, has a LOT of drawbacks, and definitely is not production-grade:

  • not very performant
  • not scaleable, and it isn't safe for multiple connections at once
  • limited to one table per DB
  • simplistic query support

Depending on your use-case, the benefits may be worthwhile:

  • very easy to reimplement and maintain
  • zero dependencies
  • helps keep large amounts of data out of memory when not needed
  • very minimal, no daemons to spin up or configuration to learn

Example usage:

type Car struct {
  Year  uint16
  Color string
  Make  string
  Model string
}

tempFile, _ := os.CreateTemp(os.TempDir(), "simpledb-")

db, err := simpledb.NewDB(tempFile, Car{})
if err != nil {
  // ...
}

carID, err := db.Insert(&Car{
  Year:  2008,
  Color: "brown",
  Make:  "Mazda",
  Model: "Miata",
})
if err != nil {
  // ...
}

var car Car
err = db.Find(carID, &car)
if err != nil {
  // ...
}

fmt.Printf("car.Year: %d\n", car.Year)
fmt.Printf("car.Color: %s\n", car.Color)
fmt.Printf("car.Make: %s\n", car.Make)
fmt.Printf("car.Model: %s\n", car.Model)

To create a SimpleDB, you must provide a data source which satisfies the simpledb.Source interface:

type Source interface {
  io.Reader
  io.Writer
  io.Seeker
  io.Closer
  Truncate(size int64) error
}

simpledb.Source is an interface for the long-term storage used by DB. Usually, this is an *os.File, but you could also design a source which reads and writes through some other means. Read and Write calls should both move the same cursor of the Seeker, and Seek calls should support all three whence values.

You must also pass a zero-value struct instance, whose exported fields will define the table schema. SimpleDB is, for the moment, a single-table database.

Guidelines for struct types which can define valid SimpleDB tables:

  • The struct type must export only fields whose types are fixed-size, or are slices which boil down to those types.
  • simpledb.PrimitiveFixedSizeKinds defines the set of usable fixed-size types.
  • string is also allowed.
  • Arrays of fixed-size types are considered to also be fixed-size types and can be used.
  • The sequence in which struct fields are declared does not matter - they are sorted alphabetically to decide encoding order.

Additional type support (e.g. for maps and structs) is forthcoming.

How does it work?

When first opened on a new file, the database will not write any data, because an empty SimpleDB has zero size. As values are inserted into the table, SimpleDB encodes and writes the values directly to the Source file. First it writes the 'row header', consisting of a random uint64 ID, and the size of the row, encoded as a unsigned varint. The index of that row is its offset from the start, which for the first row would be zero; For the second row, the index would be the size of the first row, etc.

Slices are encoded first by writing their slice length encoded as a unsigned varint, then each element is written. All values are encoded with binary.BigEndian.

As each row is inserted, their indices are cached in memory, mapped to by their ID numbers. A caller who retains the ID number can thus quickly look-up and decode the stored value. However, perhaps you don't have the ID number, or you want to find multiple rows...

Filtering

You can use the db.Filter method to return all rows which match a certain query. Currently this is limited to deep-equality-based checks, but in the future I plan to extend the query functionality quite a bit.

rows, err := usersDB.Filter(map[string]interface{}{
  "UserName": "josh89",
})
if err != nil {
  // ...
} else if len(rows) == 0 {
  // username not found
}

id := rows[0].ID
user := rows[0].Value.(*User)

Indexing

If you will need to look up rows using certain fields frequently, you can add an index to that field.

type User struct {
  UserName string `simpledb:"indexed"`
}

Adding the tag simpledb:"indexed" to a struct field used to define a SimpleDB Schema will add an in-memory cache for that field to the database. The cache records the row's ID number, mapping it to the value of the field upon insertion or reading from disk.

When calling db.Filter, SimpleDB will compare the cached value with the queried value using reflect.DeepEqual.

Dropping

You can drop rows using db.Drop(id), but this alone does not reduce the on-disk size of the database. It only zeros the given row on-disk. Dropped rows on-disk look like big sectors of zeros which are skipped when reading the database from disk.

Defragging

To re-compact the database on-disk back down to its optimal size, you should call db.Defrag(). This operation removes all zero'd rows from the database file on-disk and thus reduces file size. Best practice is to call db.Defrag() before closing an application which uses a SimpleDB.