Categorygithub.com/mbaksheev/clickhouse-table-graph

# README

ClickHouse table graph

build

Tool for visualizing dependencies between ClickHouse tables.

Table of contents

Overview

The main goal of this tool is to visualize ClickHouse table dependencies. When you have big number of tables in your ClickHouse database, it can be really hard to understand how they are connected and what is the data flow between them.

With this tool you can easily visualize tables dependencies and understand the data flow in your ClickHouse database. Thanks to mermaid, an awesome tool for generating diagrams, the resulting graph looks attractive and can be easily added to your markdown documentation.

flowchart TB
tree.mid_table@{ shape: rect, label: "tree.mid_table (ReplacingMergeTree)" } --> tree.mid_distributed@{ shape: st-rect, label: "tree.mid_distributed (Distributed)" }
tree.mid_table@{ shape: rect, label: "tree.mid_table (ReplacingMergeTree)" } --> tree.target_table_mv@{ shape: hex, label: "tree.target_table_mv (MaterializedView)" }
tree.mid_table_mv@{ shape: hex, label: "tree.mid_table_mv (MaterializedView)" } --> tree.mid_table@{ shape: rect, label: "tree.mid_table (ReplacingMergeTree)" }
tree.base_table@{ shape: rounded, label: "tree.base_table (Null)" } --> tree.mid_table_mv@{ shape: hex, label: "tree.mid_table_mv (MaterializedView)" }
tree.target_table_mv@{ shape: hex, label: "tree.target_table_mv (MaterializedView)" } --> tree.target_table@{ shape: rect, label: "tree.target_table (ReplacingMergeTree)" }
tree.target_table@{ shape: rect, label: "tree.target_table (ReplacingMergeTree)" } --> tree.target_2_distributed@{ shape: st-rect, label: "tree.target_2_distributed (Distributed)" }
tree.target_table@{ shape: rect, label: "tree.target_table (ReplacingMergeTree)" } --> tree.target_distributed@{ shape: st-rect, label: "tree.target_distributed (Distributed)" }
tree.target_table_mv_2@{ shape: hex, label: "tree.target_table_mv_2 (MaterializedView)" } --> tree.target_table@{ shape: rect, label: "tree.target_table (ReplacingMergeTree)" }
tree.another_base_table@{ shape: rounded, label: "tree.another_base_table (Null)" } --> tree.target_table_mv_2@{ shape: hex, label: "tree.target_table_mv_2 (MaterializedView)" }
style tree.mid_table stroke:#f4e022

This tool is written in Go and functionality is well split into separate packages, so it can be easily integrated into other Go projects or used as a standalone CLI tool.

How to use

CLI application

The CLI application allows you to generate mermaid flowchart from ClickHouse tables dependencies from command line.

Pre-requisites

  • Go 1.22 or higher
  • ClickHouse server running and credentials to access it

Build and run

  • checkout the repository
  • run go build -o bin/ ./... to build the CLI application. This command will build the binaries chtg-cli and save it to the bin directory. Note: please check Go documentation for details.
  • run ./bin/chtg-cli to start the CLI application

Use the following flags to specify configuration options:

-clickhouse-host string
   Clickhouse host to get tables from. Optional. Default value is "localhost"
-clickhouse-port string
   Clickhouse port. Optional. Default value 9000
-clickhouse-table string
   Clickhouse full table name in format <database>.<table> to get dependencies for. Required.
-clickhouse-user string
   Clickhouse username. Optional. Default value is "" (empty string)
-out-file string
   Output file name. Optional. If not specified, the output will be printed to the console.
-out-format string
   Output format. Default value "mermaid-html". Possible values: "mermaid-html", "mermaid-md".
-mermaid-theme string
   Mermaid theme. Optional. Default value is 'default'. See https://mermaid-js.github.io/mermaid/#/theming
-table-highlight-color string
   Highlight color for the selected clickhouse table. E.g. '#ff5757' or 'red'. Optional. If not specified, the table will not be highlighted. See https://mermaid.js.org/syntax/flowchart.html?id=flowcharts-basic-syntax#styling-a-node
-help
   Show help

For example:

./bin/chtg-cli -clickhouse-host localhost -clickhouse-port 9000 -clickhouse-user my_user -clickhouse-table my_db.my_table -out-file my-table-graph.html -out-format mermaid-html -table-highlight-color '#f4e022'

The command above will ask for the ClickHouse password and generate the mermaid flowchart diagram for the my_db.my_table table and save it to the my-table-graph.html file.

Packages

cmd/chtg-cli main package

The main package contains the CLI application. Run the main method from this package to start the CLI application and generate mermaid digraph from ClickHouse tables dependencies. See more in CLI application section.

table package

The table package contains the definition of the ClickHouse table. This definition is used to represent the table in the table graph. All other modules uses this definition.

clickhouse package

The clickhouse package contains the ClickHouse client implementation. It allows you to connect to ClickHouse server and get tables metadata.

In order to get tables metadata from ClickHouse server, you need to create a new ClickHouse server struct with the server URL and credentials:

chServer := clickhouse.Server{
    Address:  "localhost:9000",
    Username: "ch_user",
    Password: "ch_password",
}

Then you can get tables information from the ClickHouse server by calling the GetTables() method:

tables, err := chServer.GetTables()

The GetTables() method returns a slice of table.Info structs, where each item contains information about the table fetched from the system.tables table.

The slice of tables can be used to generate table graph by using methods from the graph package.

graph package

The graph package contains the implementation of the table graph. It allows you to generate graph of tables dependencies from the slice of table.Info. To do this, you need to

  • create a new graph.Graph struct with graph.New()
  • populate the graph by adding tables with graph.LinksBuilder.AddTable(table table.Info)
  • get the table links for the specified table by calling graph.LinksBuilder.TableLinks(tableKey table.Key)

The TableLinks() traverses the graph in direction as dependencies order:

  • The source table for a materialized view is a parent node for the materialized view. The materialized view is a child node for the source table.
  • The target table for a materialized view is a child node for the materialized view. The materialized view is a parent node for the target table.
  • If a table has related distributed table, the distributed table is a child node and the table. The table is a parent node for the distributed table.

The result of the TableLinks() contains the following links:

  • all links for child nodes which can be reached from the specified table when going down by dependencies
  • all links for parent nodes which can be reached from the specified table when going up by dependencies

It does not contain links for the nodes which are not connected to the specified table at all or are connected as dependencies (children nodes) for parent nodes, because they are not relevant for the data flow related to the specified table.

Code example:

myTableGraph := graph.New() // create new graph
for _, t := range tables {
    // add tables to the graph. These tables will be analyzed for dependencies
    myTableGraph.AddTable(t)
}
// get table links which are relevant for the data flow related to the specified table
tableLinks, err := myTableGraph.TableLinks(table.Key{Database: chDatabase, Name: chTable}) 

In the code above the tableLinks is a variable of type graph.Links which contains slice of links and additional information like the table for which the links are generated and map with all tables information.

mermaid package

Once you have the table links, you can generate mermaid flowchart diagram from them by using the mermaid package. To do it, use the mermaid.Flowchart(graphLinks graph.Links, options FlowchartOptions) string function. This function will return flowchart diagram as a string in mermaid Markdown format. You may provide options for the flowchart diagram to customize its appearance, e.g. flowchart orientation and additional settings:

type FlowchartOptions struct {
	// Orientation is the orientation of the flowchart graph.
	// "TB", "BT", "LR", "RL" see mermaid documentation: https://mermaid.js.org/syntax/flowchart.html#direction
	Orientation Orientation
	// IncludeEngine is a flag to include the engine information in the node label.
	// When true, the engine information is included.
	IncludeEngine bool
}

Code example:

mermaidFlowchart := mermaid.Flowchart(*tableLinks, mermaid.FlowchartOptions{Orientation: mermaid.TB, IncludeEngine: true, InitialTableHighlightColor: "#f4e022"})

will generate the flowchart diagram in the top-to-bottom orientation with the engine information included in the node label:

flowchart TB
test_db.target_table_mv@{ shape: hex, label: "test_db.target_table_mv (MaterializedView)" } --> test_db.target_table@{ shape: rect, label: "test_db.target_table (ReplacingMergeTree)" }
test_db.input_table@{ shape: rounded, label: "test_db.input_table (Null)" } --> test_db.target_table_mv@{ shape: hex, label: "test_db.target_table_mv (MaterializedView)" }
style test_db.input_table stroke:#f4e022

This diagram can be easily added to your markdown documentation and rendered. For example GitHub will render the mermaid diagram if you specify mermaid syntax for the code block:

```mermaid
```

The above Markdown diagram is rendered by GitHub:

flowchart TB
test_db.target_table_mv@{ shape: hex, label: "test_db.target_table_mv (MaterializedView)" } --> test_db.target_table@{ shape: rect, label: "test_db.target_table (ReplacingMergeTree)" }
test_db.input_table@{ shape: rounded, label: "test_db.input_table (Null)" } --> test_db.target_table_mv@{ shape: hex, label: "test_db.target_table_mv (MaterializedView)" }
style test_db.input_table stroke:#f4e022

In case if you do not want to embed the diagram into your markdown documentation, you can wrap the diagram into a html document which will include mermaid.js library. So you can share it or use as a standalone page:

html = mermaid.Html(mermaidFlowchart, mermaid.HtmlOptions{})

The code above will return html document as a string with the diagram and all necessary scripts and styles to render it. The fist parameter is the string with the mermaid diagram in Markdown format, the second parameter is the options for the html document. With the options you can specify document title and custom mermaid library URL.

Future plans

  • Add visualization for dependencies on Dictionaries
  • Add visualization for users and roles dependencies

Contributing

If you have any ideas or want to contribute to this project, feel free to create an issue or pull request. Your contribution is highly appreciated.

Authors

# Packages

Package clickhouse provides a client to get tables information from Clickhouse server.
No description provided by the author
Package graph provides a way to build a graph of specified tables.
Package mermaid provides functionality to generate Mermaid flowchart diagrams.
Package table provides a table info struct and interface which can be used to get information about tables and provide it to the graph builder.