Categorygithub.com/metatexx/avrox
modulepackage
0.7.2
Repository: https://github.com/metatexx/avrox.git
Documentation: pkg.go.dev

# README

AvroX

AvroX enables Avro formatted data to be discoverable in a closed system. The idea behind it was to create a method for encoding structured data for NATS messages in a more compact format than using JSON + JSON schemas.

This package is still experimental!

TL;DR: Don't use it for anything important!

This package is a work in progress (WIP). Feel free to try it out, but don't expect it to be suitable for production use, and anticipate daily changes to the API. We publish this primarily because some of the tools we also publish utilize this package, and we believe it could become very useful eventually.

Take note of our disclaimer below!

What it delivers:

  • Highly concise binary encoding for small data sizes.
  • A JSON schema with additional data documentation.
  • Optional further data compression (we are currently using Snappy, but we support up to 7 types with AvroX).
  • Avro supports good native types for time, date, and binary data. This works for Go because of the wonderful hamba/avro/v2 package.
  • A usage experience similar to other marshaller implementations.
  • A three level versioning identifier (similar to a semver) with N.S.V which is namespace, schema, version.
  • Support for unmarshalling to a given list of schemas (unions) where the destinations can be a nil type or a concrete type. It returns then either a new allocated type of uses the given storage after identifying what schema is used in the source data.
  • Some basic types like string, int, map[string]any can be directly marshalled, while also utilizing Avro.
  • The unmarshaller automatically detects JSON (for manual debugging) as an alternative to Avro data (may get removed soon).
  • Seamless integration with the NATS CLI Tool --translate option through the use of our message converter tool msgcvt. This will also eventually support schema storage within NATS.
  • The schema can be used in an interpreted or compiled manner (we do not use compiled Avro so far).
  • Schema registry with namespace support, accommodating both public and private schemas.
  • AvroX Data could be discovered in a binary stream (although this is just an experiment)
  • Avro schema's can be also be autogenerated through avscgen which is currently still proprietary and may be release by us to the public eventually.
  • We are working also on an auto indexer that can generate indexes for the messages in a stream based on indexin information that can be added to a shemas fields (a bit like adding indexes when using a database).

(This list is not exhastive...)

How we arrived at the current stage:

During our research into alternative formats for storing a large volume of small data in a NATS JetStream, we examined various formats:

  • JSON + JSON Schema was our original idea. However, the overhead became rather significant when storing millions of messages. Ensuring the schema and JSON were in sync also required extra steps during implementation and testing. We believed there must be a more elegant solution, which led us to begin our search.
  • Gob was our first alternative, but it quickly became apparent that it actually increased data size when used with numerous individual messages and struct tags. It also required recompilation and lacked discoverability. Additionally, Go code is not inherently a schema. Parsing structs and struct tags to generate documents was quite cumbersome.
  • ProtoBuf necessitated recompilation and a considerable amount of additional tooling, as well as generated excessive code. We previously used it alongside Twirp before deciding to employ NATS for messaging at the border too (see: https://github.com/oderwat/go-nats-app). Twirp inspired us to consider supporting JSON as an alternative to the endpoints.
  • CBOR, with its Go package fxamacker/cbor, looked promising and somewhat reduced data size, but not significantly enough. It also lacked a robust schema representation. However, it could be parsed without the schema, like JSON. While working on this, we realized that a shareable, simple text schema was what we needed.
  • BSON was briefly considered but quickly ruled out.

As we experimented with various implementations and formats, our desired features became increasingly clear.

  1. It should have a small storage size.
  2. We want a mandatory schema for documentation and discovery.
  3. It should be very easy to use and plug-in, just like other marshallers.
  4. Debugging messages should be possible without recompiling the used tools.
  5. It should not hinder prototyping or the creation of quick tools.
  6. There should be a way to bypass it and revert to using JSON.
  7. It should be safe and performant.
  8. While an interpreted schema is beneficial, there should also be a way to generate specialized code for increased performance.

Disclaimer

This code and documentation are works in progress, and everything may change without further notice. We are shure there are bugs to fix and optimisations to make. This project utilizes the GPT-4 language model for generating some of its content.

MIT License / Copyright 2023 by METATEXX GmbH

# Packages

No description provided by the author
No description provided by the author
Package rawdate provides a simple date handling utility without time.

# Functions

AvroDate truncates a go time.Time to the value that gets stored the avro logicalDate It also makes sure that the time is expressed in UTC().
AvroTime truncates a go time.Time to the value that gets stored the avro logicalTime (which has a granularity of milliseconds while go has nanoseconds) It also makes sure that the time is expressed in UTC().
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
JoinedSchemas returns a json array of all schemers in the arguments.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
Unmarshal uses the give schema for unmarshalling and checks if it fits to the decode data.
No description provided by the author
No description provided by the author
No description provided by the author
UnmarshalSchemer expects a slice with pre-allocated schemers and uses the magic in the data to unmarshal the correct one.
No description provided by the author
No description provided by the author

# Constants

BasicByteSliceSchemaID is the id for the avro schema of struct BasicInt.
BasicDecimalSchemaID is the id for the avro schema of struct BasicDecimal (*big.Rat / decimal.fixed).
BasicIntSchemaID is the id for the avro schema of struct BasicInt.
BasicMapStringAnySchemaID is the id for the avro schema of struct BasicMapStringAny.
BasicRawDateSchemaID is the id for the avro schema of struct BasicRawDate (rawdate.Rawdate).
BasicStringSchemaID is the id for the avro schema of struct BasicString.
BasicTimeSchemaID is the id for the avro schema of struct BasicTime.
Uses -1 as compression parameter.
Uses -1 as compression parameter.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
The struct will be aligned to 8 bytes anyway.
NamespaceBasic is reserved for the basic types and structs that are implemented through avrox.
No description provided by the author
NamespacePrivate means that it is not registered and we use private schemas.
NamespaceReserved1 is reserved for later.
NamespaceReserved2 is reserved for later.
NamespaceReserved3 is reserved for later.
Schema<<8 | Version.
Schema 0 means that it is not defined (but may belong to a namespace).

# Variables

go:generate avscgen -n "basics" -o avsc/ .
go:generate avscgen -n "basics" -o avsc/ .
go:generate avscgen -n "basics" -o avsc/ .
go:generate avscgen -n "basics" -o avsc/ .
go:generate avscgen -n "basics" -o avsc/ .
go:generate avscgen -n "basics" -o avsc/ .
go:generate avscgen -n "basics" -o avsc/ .
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
one of the best when analysing our data.

# Structs

No description provided by the author
BasicDecimal is the container type to store a *bigRat value into a single avro schema.
No description provided by the author
No description provided by the author
BasicRawDate is the container type to store a timestamp in a single avro schema.
BasicString is the container type to store a string in a single avro schema.
BasicTime is the container type to store a timestamp in a single avro schema.

# Interfaces

No description provided by the author

# Type aliases

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author