Categorygithub.com/nuts-foundation/go-leia/v4
modulepackage
4.0.3
Repository: https://github.com/nuts-foundation/go-leia.git
Documentation: pkg.go.dev

# README

Build Coverage Maintainability

go-leia

Go Lightweight Embedded Indexed (JSON) Archive

go-leia is built upon bbolt. It adds indexed based search capabilities for JSON documents to the key-value store.

The goal is to provide a simple and fast way to find relevant JSON documents using an embedded Go key-value store.

Table of Contents

Installing

Install Go and run go get:

$ go get github.com/nuts-foundation/go-leia

When using Go > 1.16, Go modules will probably require you to install additional dependencies.

$ go get github.com/stretchr/testify
$ go get github.com/tidwall/gjson
$ go get go.etcd.io/bbolt

Opening a database

Opening a database only requires a file location for the bbolt db.

package main

import (
	"log"
	
	"github.com/nuts-foundation/go-leia"
)

func main() {
	// Open the my.db data file in your current directory.
	// It will be created if it doesn't exist using filemode 0600 and default bbolt options.
	store, err := leia.NewStore("my.db")
	if err != nil {
		log.Fatal(err)
	}
	defer store.Close()

	...
}

Collections

Leia adds collections to bbolt. Each collection has its own bucket where documents are stored. An index is also only valid for a single collection.

To create a collection:

func main() {
    store, err := leia.NewStore("my.db")
	...
	
    // if a collection doesn't exist, it'll be created for you.
    // the underlying buckets are created when a document is added.
    collection := store.Collection("credentials")
}

Writing

Writing a document to a collection is straightforward:

func main() {
    store, err := leia.NewStore("my.db")
    collection := store.Collection("credentials")
	...
	
    // leia uses leia.Documents as arguments. Which is basically a []byte
    documents := make([]leia.Document, 1)
    documents[1] = leia.DocumentFromString("{...some json...}")
    
    // documents are added by slice
    collection.Add(documents)
}

Documents are added by slice. Each operation is done within a single bbolt transaction. BBolt is a key-value store, so you've probably noticed the key is missing as an argument. Leia computes the sha-1 of the document and uses that as key.

To get the key when needed:

func main() {
    store, err := leia.NewStore("my.db")
    collection := store.Collection("credentials")
    ...
    
    // define your document
    document := leia.DocumentFromString("{...some json...}")
    
    // retrieve a leia.Reference (also a []byte)
    reference := collection.Reference(document)
}

Documents can also be removed:

func main() {
    store, err := leia.NewStore("my.db")
    collection := store.Collection("credentials")
    ...
    
    // define your document
    document := leia.DocumentFromString("{...some json...}")
    
    // remove a document using a leia.Document
    err := collection.Delete(document)
}

Reading

A document can be retrieved by reference:

func main() {
    store, err := leia.NewStore("my.db")
    collection := store.Collection("credentials")
    ...
    
    // document by reference, it returns nil when not found
    document, err := collection.Get(reference)
}

Searching

The major benefit of leia is searching. The performance of a search greatly depends on the available indices on a collection. If no index matches the query, a bbolt cursor is used to loop over all documents in the collection.

Leia supports equal, prefix and range queries. The first argument for each matcher is the JSON path using the syntax from gjson. Only basic path syntax is used. There is no support for wildcards or comparison operators. The second argument is the value to match against. Leia can only combine query terms using AND logic.

func main() {
    ...
    
    // define a new query
    query := leia.New(leia.Eq("subject", "some_value")).
                  And(leia.Range("some.path.#.amount", 1, 100))
}

Getting results can be done with either Find or Iterate. Find will return a slice of documents. Iterate will allow you to pass a DocWalker which is called for each hit.

func main() {
    ...
    
    // get a slice of documents
    documents, err := collection.Find(query)
    
    // use a DocWalker
    walker := func(ref []byte, doc []byte) error {
    	// do something with the document
    }
    err := collection.Iterate(query, walker)
}

Indexing

Indexing JSON documents is where the real added value of leia lies. For each collection multiple indices can be added. Each added index will slow down write operations.

An index can be added and removed:

func main() {
    ...
    
    // define the index
    index := leia.NewIndex("compound",
                leia.NewFieldIndexer("subject"),
                leia.NewFieldIndexer("some.path.#.amount"),
    )
    
    // add it to the collection
    err := collection.AddIndex(index)
    
    // remove it from the collection
    err := collection.DropIndex("compound")
}

The argument for NewFieldIndexer uses the same notation as the query parameter, also without wildcards or comparison operators. Adding an index will trigger a re-index of all documents in the collection. Adding an index with a duplicate name will ignore the index.

Alias option

Leia support indexing JSON paths under an alias. An alias can be used to index different documents but use a single query to find both.

func main() {
    ...
    
    // define the index for credentialX
    indexX := leia.NewIndex("credentialX", leia.NewFieldIndexer("credentialSubject.id", leia.AliasOption{Alias: "subject"}))
    // define the index for credentialY
    indexY := leia.NewIndex("credentialY", leia.NewFieldIndexer("credentialSubject.organization.id", leia.AliasOption{Alias: "subject"}))
    
    ...

    // define a new query
    query := leia.New(leia.Eq("subject", "some_value"))
}

The example above defines two indices to a collection, each index has a different JSON path to be indexed. Both indices will be used when the given query is executed, resulting in documents that match either index.

Transform option

A transformer can be defined for a FieldIndexer. A transformer will transform the indexed value and query parameter. This can be used to allow case-insensitive search or add a soundex style index.

func main() {
    ...
    
    // This index transforms all values to lowercase
    index := leia.NewIndex("credential", leia.NewFieldIndexer("subject", leia.TransformOption{Transform: leia.ToLower}))
    
    ...

    // these queries will yield the same result
    query1 := leia.New(leia.Eq("subject", "VALUE"))
    query2 := leia.New(leia.Eq("subject", "value"))
}

Tokenizer option

Sometimes JSON fields contain a whole text. Leia has a tokenizer option to split a value at a JSON path into multiple keys to be indexed. For example, the sentence "The quick brown fox jumps over the lazy dog" could be tokenized so the document can easily be found when the term fox is used in a query. A more advanced tokenizer could also remove common words like the.

func main() {
    ...
    
    // This index transforms all values to lowercase
    index := leia.NewIndex("credential", leia.NewFieldIndexer("text", leia.TokenizerOption{Tokenizer: leia.WhiteSpaceTokenizer}))
    
    ...

    // will match {"text": "The quick brown fox jumps over the lazy dog"}
    query := leia.New(leia.Eq("subject", "fox"))
}

All options can be combined.

# Packages

No description provided by the author

# Functions

ComposeKey creates a new key from two keys.
Eq creates a query part for an exact match.
JSONLDValueCollector collects values given a list of IRIs that represent the nesting of the objects.
JSONPathValueCollector collects values at a given JSON path expression.
KeyOf creates a key from an interface.
MustParseScalar returns a Scalar based on an interface value.
New creates a new query with an initial query part.
NewFieldIndexer creates a new fieldIndexer.
NewIRIPath creates a QueryPath of JSON-LD terms.
NewJSONPath creates a JSON path query: "person.path" or "person.children.#.path" # is used to traverse arrays.
NewStore creates a new store.
NotNil creates a query part where the value must exist.
ParseScalar returns a Scalar based on an interface value.
Prefix creates a query part for a partial match The beginning of a value is matched against the query.
Range creates a query part for a range query.
TokenizerOption is the option for a FieldIndexer to split a value to be indexed into multiple parts.
ToLower transforms all Unicode letters mapped to their lower case.
TransformerOption is the option for a FieldIndexer to apply transformation before indexing the value.
WhiteSpaceTokenizer tokenizes the string based on the /\S/g regex.
WithDocumentLoader overrides the default document loader.
WithoutSync is a store option which signals the underlying bbolt db to skip syncing with disk.

# Constants

JSONCollection defines a collection uses JSON search paths to index documents.
JSONLDCollection defines a collection uses JSON-LD IRI search paths to index documents.
No description provided by the author

# Variables

ErrInvalidJSON is returned when invalid JSON is parsed.
ErrInvalidQuery is returned when a collection is queried with the wrong type.
ErrInvalidValue is returned when an invalid value is parsed.
ErrNoIndex is returned when no index is found to query against.
ErrNoQuery is returned when an empty query is given.

# Structs

Query represents a query with multiple arguments.

# Interfaces

Collection defines a logical collection of documents and indices within a store.
FieldIndexer is the public interface that defines functions for a field index instruction.
Index describes an index.
No description provided by the author
QueryPath is the interface for the query path given in queries.
QueryPathComparable defines if two structs can be compared on query path.
Scalar represents a JSON or JSON-LD scalar (string, number, true or false).
Store is the main interface for storing/finding documents.

# Type aliases

No description provided by the author
CollectionType defines if a Collection is a JSON collection or JSONLD collection.
Document represents a JSON document in []byte format.
DocumentWalker defines a function that is used as a callback for matching documents.
No description provided by the author
IndexOption is the option function for adding options to a FieldIndexer.
Key is used as DB key type.
Reference equals a document hash.
ReferenceFunc is the func type used for creating references.
ReferenceScanFn is a function type which is called with an index key and a document Reference as value.
StoreOption is the function type for the Store Options.
No description provided by the author
Tokenizer is a function definition that transforms a text into tokens.
Transform is a function definition for transforming values and search terms.