Categorygithub.com/koykov/vector
repositorypackage
1.2.6
Repository: https://github.com/koykov/vector.git
Documentation: pkg.go.dev

# README

Vector API

Provide Vector API for various parsers.

The main idea: many data interchange formats (such as JSON, XML, ...) stores data as a tree. Parsers of that formats reproduces that tree in a memory somehow or other.

This parser uses different way: it stores all parsed nodes (key-value pairs) in a special array (vector). That way protects from redundant memory allocations and reduces pointers. In fact, vector contains only two pointers (array of nodes and array of indexes). GC omits checking of that type of structs.

Comparison

All known parsers has the following data structures (approximately) to represent node value (key-value pair):

type Node struct {
	typ Type    // [null, object, array, string, number, true, false]
	obj Object  // one pointer to slice and N*2 pointers inside KeyValue struct, see below
	arr []*Node // one pointer for the slice and N pointers for each array item
	str string  // one pointer
}

type Object []KeyValue

type KeyValue struct {
	key string // one pointer
	val *Node  // one pointer to node
}

As you see during parsing will be produced tons of pointers. Better consider this with an example of JSON:

{
  "a": true,
  "b": {
    "c": "foo",
    "d": [
      5,
      3.1415,
      812.48927
    ]
  }
}

Majority of JSON parsers will build array of nodes like:

01234567
type: objtype: booltype: objtype: strtype: arrtype: numtype: numtype: num
key: ""key: "a"key: ""key: "c"key: ""key: ""key: ""key: ""
str: ""str: ""str: ""str: "foo"str: ""str: ""str: ""str: ""
num: 0num: 0num: 0num: 0num: 0num: 5num: 3.1415num: 812.48927
bool: falsebool: truebool: falsebool: falsebool: falsebool: falsebool: falsebool: false
child: [*1, *2]child: []child: [*3, *4]child: []child: [*5, *6, *7]child: []child: []child: []

As you can see, independent of JSON node type, each parsed node contains at least 3 pointers:

  • key (string)
  • str (string)
  • child (slice of node pointers)

JSON vector has different approach and build the following array of nodes and index:

Vector:

01234567
type: objtype: booltype: objtype: strtype: arrtype: numtype: numtype: num
key pos: 0key pos: 5key pos: 18key pos: 29key pos: 45key pos: 0key pos: 0key pos: 0
key len: 0key len: 1key len: 1key len: 1key len: 1key len: 0key len: 0key len: 0
val pos: 0val pos: 9val pos: 0val pos: 34val pos: 0val pos: 57val pos: 64val pos: 80
val len: 0val len: 4val len: 0val len: 3val len: 0val len: 1val len: 6val len: 9
depth: 0depth: 1depth: 1depth: 2depth: 2depth: 3depth: 3depth: 3
idx pos: 0idx pos: 0idx pos: 0idx pos: 0idx pos: 0idx pos: 0idx pos: 0idx pos: 0
idx len: 2idx len: 0idx len: 2idx len: 0idx len: 3idx len: 0idx len: 0idx len: 0

Index (Y-axis means depth, X-axis means position in the index):

X/Y012
00--
112-
234-
3567

Each node contains only int variables and therefore avoid escapes to heap at all. The whole vector contains only two pointers - nodes array and index matrix. GC checks it instantly. That way also allows parsing JSON with unlimited depths.