Categorygithub.com/maxmouchet/mmdb-to-clickhouse
modulepackage
1.2.1
Repository: https://github.com/maxmouchet/mmdb-to-clickhouse.git
Documentation: pkg.go.dev

# README

mmdb-to-clickhouse

Release Test

Import MMDB files into ClickHouse.

This repository contains a dummy example.mmdb file for testing purpose. To get real data, check out the free Country & ASN database from IPinfo IPinfo which is also supported by this tool.

Features

  • Automatically infers the table schema from the MMDB file
  • Supports nested records by flattening them
  • Stores data in a partitioned table to keep track of history
  • Creates an IP trie dictionary for fast lookups

Current limitations

  • The schema is inferred from the first record only. If subsequent records have additional fields, those will be ignored.
  • The network and partition names are reserved and must not be present in the MMDB file

Installation

Download the latest release for your operating system and run:

./mmdb-to-clickhouse -h

You can also run it through Docker with:

docker run --rm -it ghcr.io/maxmouchet/mmdb-to-clickhouse -h

Example usage

First start a ClickHouse instance:

docker run --name clickhouse --rm -d -p 9000:9000 clickhouse/clickhouse-server

Then download the example MMDB file:

wget https://github.com/maxmouchet/mmdb-to-clickhouse/raw/main/example.mmdb

And run mmdb-to-clickhouse:

./mmdb-to-clickhouse -dsn clickhouse://localhost:9000 -mmdb example.mmdb -name example_mmdb -test

The output should look like the following:

2024/07/04 13:51:07 Net schema: `network` String, `pointer` UInt64, `partition` Date
2024/07/04 13:51:07 Val schema: `pointer` UInt32, `country` String, `partition` Date
2024/07/04 13:51:07 Creating table example_mmdb_net_history
2024/07/04 13:51:07 Creating table example_mmdb_val_history
2024/07/04 13:51:07 Creating dictionary example_mmdb_net
2024/07/04 13:51:07 Creating dictionary example_mmdb_val
2024/07/04 13:51:07 Dropping partition 2024-07-04
2024/07/04 13:51:07 Dropping partition 2024-07-04
2024/07/04 13:51:07 Inserting data
2024/07/04 13:51:07 Inserted 1 networks and 1 values
2024/07/04 13:51:07 Creating function example_mmdb
2024/07/04 13:51:07 Running test query: SELECT example_mmdb('1.1.1.1', 'country')
2024/07/04 13:51:07 This may take some time as the dictionnary gets loaded in memory
2024/07/04 13:51:07 Test query result: WW

This will create:

  • Two partitioned tables (30 days of history by default, see the -ttl option):
    • example_mmdb_net_history: IP networks and pointers to distinct values
    • example_mmdb_val_history: pointers and associated values
  • Two dictionaries:
    • example_mmdb_net: an in-memory IP trie which always uses the latest partition from example_mmdb_net_history. This dictionary enables very fast IP lookups.
    • example_mmdb_val: an in-memory KV mapping which always uses the latest partition from example_mmdb_val_history.
  • One function:
    • example_mmdb(ip, attrs): this function first looks up the pointer in example_mmdb_net and then retrieves the value in example_mmdb_val.

Open a REPL and inspect the tables:

docker exec -it clickhouse clickhouse client
SHOW TABLES
-- ┌─name─────────────────────┐
-- │ example_mmdb_net         │
-- │ example_mmdb_net_history │
-- │ example_mmdb_val         │
-- │ example_mmdb_val_history │
-- └──────────────────────────┘

SELECT * FROM example_mmdb_net
-- ┌─network───┬─pointer─┬──partition─┐
-- │ 0.0.0.0/0 │       0 │ 2024-07-04 │
-- └───────────┴─────────┴────────────┘

SELECT * FROM example_mmdb_val
-- ┌─pointer─┬─country─┬──partition─┐
-- │       0 │ WW      │ 2024-07-04 │
-- └─────────┴─────────┴────────────┘

SELECT example_mmdb('1.1.1.1', 'country') AS country
-- ┌─country─┐
-- │ WW      │
-- └─────────┘

To clean up just remove the ClickHouse instance:

docker rm -f clickhouse