Categorygithub.com/miku/microblob
modulepackage
0.2.19
Repository: https://github.com/miku/microblob.git
Documentation: pkg.go.dev

# README

microblob

microblob is a simplistic key-value store, that serves JSON documents from a file over HTTP. It is implemented in a few hundred lines of code and does not contain many features.

Warning: This server SHOULD NEVER BE EXPOSED PUBLICLY as it contains no security, rate-limiting or other safety measures whatsoever.

microblob was written in 2017 as an ad-hoc solution to replace a previous setup using memcachedb (which was getting slow). The main goal has been to serve about 200M JSON documents from a "persistent key-value store" over HTTP and to support frequent, fast rebuilds; with limited disk space and potentially limited memory. Code lacks tests and I would write it differently today. However, it ran without issues and happily served up to 400 requests/s with limited resources and with average response times of around 1ms.

DOI Project Status: Active – The project has reached a stable, usable state and is being actively developed.

This project has been developed for Project finc at Leipzig University Library.

$ cat file.ldj
{"id": "some-id-1", "name": "alice"}
{"id": "some-id-2", "name": "bob"}

$ microblob -key id file.ldj
INFO[0000] creating db fixtures/file.ldj.832a9151.db ...
INFO[0000] listening at http://127.0.0.1:8820 (fixtures/file.ldj.832a9151.db)

It supports fast rebuilds from scratch, as the preferred way to deploy this is for a build-once update-never use case. It scales up and down with memory and can serve hundred million documents and more.

Inspiration: So what's wrong with 1975 programming? Idea: Instead of implementing complicated caching mechanisms, we hand over caching completely to the operating system and try to stay out of its way.

Inserts are fast, since no data is actually moved. 150 million (1kB) documents can be serveable within an hour.

  • ㊗️ 2017-06-30 first 100 million requests served in production

Further documentation: docs/microblob.md

Update via curl

To send compressed data with curl:

$ curl -v --data-binary @- localhost:8820/update?key=id < <(gunzip -c fixtures/fake.ldj.gz)
...

Usage

Usage of microblob:
  -addr string
        address to serve (default "127.0.0.1:8820")
  -backend string
        backend to use: leveldb, debug (default "leveldb")
  -batch int
        number of lines in a batch (default 50000)
  -c string
        load options from a config (ini) file
  -create-db-only
        build the database only, then exit
  -db string
        the root directory, by default: 1000.ldj -> 1000.ldj.05028f38.db (based on flags)
  -ignore-missing-keys
        ignore record, that do not have a the specified key
  -key string
        key to extract, json, top-level only
  -log string
        access log file, don't log if empty
  -r string
        regular expression to use as key extractor
  -s string
        the config file section to use (default "main")
  -t    top level key extractor
  -version
        show version and exit

What it doesn't do

  • no deletions (microblob is currently append-only and does not care about garbage, so if you add more and more things, you will run out of space)
  • no compression (yet)
  • no security (anyone can query or update via HTTP)

Installation

Debian and RPM packages: see releases.

Or:

$ go install github.com/miku/microblob/cmd/microblob@latest

# Packages

No description provided by the author

# Functions

Append add a file to an existing blob file and adds their keys to the store.
AppendBatchSize uses a given batch size.
IsAllZero returns true, if all bytes in a slice are zero.
NewHandler sets up routes for serving and stats.
NewLineProcessor reads lines from the given reader, extracts the key with the given key function and writes entries to the given entry writer.
NewLineProcessorBatchSize reads lines from the given reader, extracts the key with the given key function and writes entries to the given entry writer.
WithLastResponseTime keeps track of the last response time in exported variable lastResponseTime.

# Constants

Version of application.

# Variables

ErrInvalidValue if a value is corrupted.

# Structs

BlobHandler serves blobs.
DebugBackend just writes the key, value and offsets to a given writer.
Entry
Entry associates a string key with a section in a file specified by offset and length.
LevelDBBackend writes entries into LevelDB.
LineProcessor reads a line, extracts the key and writes entries.
ParsingExtractor actually parses the JSON and extracts a top-level key at the given path.
RegexpExtractor extract a key via regular expression.
ToplevelKeyExtractor parses a JSON object, where the actual object is nested under a top level key, e.g.
UpdateHandler adds more data to the blob server.

# Interfaces

Backend abstracts various implementations.
Counter can return the number of elements.
KeyExtractor extracts a string key from data.

# Type aliases

EntryWriter writes entries to some storage, e.g.
KeyFunc extracts a key from a blob.