# README
Statusgraph
A status page for your distributed system.
TLDR;
Try the UI (without colors):
$ docker run -it -p 8000:8000 quay.io/moolen/statusgraph:0.1.0 server
Overview
This is a webapp that let's you visualize your system: create nodes and edges to draw your system architecture and signify dependencies. Annotate your services with Metrics and Alerts via Prometheus
and Alertmanager
.
Conceptually, you want to know if your service is "running", i.e. it is in a binary state: red
lamp vs. green
lamp.
This question is incredibly hard to answer. Statusgraph taks this approach: you define alerts via Prometheus which indicate a red/yellow lamp (service is dead / not available / has issues ..).
Additionally, you can map metrics
Alert Example:
- alert: service_down
expr: up == 0
labels:
severity: critical
service_id: "{{ $labels.service_id }}" # this is known at alert-time
annotations:
description: Service {{ $labels.instance }} is unavailable.
runbook: "http://example.com/foobar"
Requirements
- alertmanager v0.20.0 and above
- prometheus
use-cases
You can visualize many different aspects of your environment.
- 10.000ft view of your distributed system
- self-contained system of a single team (a bunch of services, databases)
- network aspects: CDN, DNS & Edge services
- end-user view: edge services, blackbox tests
- Data engineering pipeline: visualize DAGs / ETL Metrics
Components
Server
- communicates with prometheus to map metrics to a particular service (think: availability, error rate)
- asks alertmanager for active alerts
Server Configuration
- contains the configuration for upstream
- contains the mapping for alerts and metrics
upstream:
prometheus:
url: http://localhost:9090
alertmanager:
url: http://localhost:9093
mapping:
# this defines how we select alerts to display
# use a `labelSelector` to filter
# and `map` to specify the lookup key in the alert struct
alerts:
label_selector:
- severity: "critical"
- severity: "warning"
important: "true"
# red & green lamp indicator
# Use this if your alerts use a specific label for a service (e.g. app=frontend / app=backend ...)
# this tells statusgraph to map alerts to nodes using the following labels/annotations
service_labels:
- "service_id"
service_annotations:
- "statusgraph-node"
metrics:
# green lamp indicator!
# this helps statusgraph to find all existing services by fetching the label values
# reference: https://prometheus.io/docs/prometheus/latest/querying/api/#querying-label-values
service_labels:
- 'service_id'
queries:
# just as an example
- name: cpu wait
query: sum(rate(node_pressure_cpu_waiting_seconds_total[1m])) by (service_id) * 100
service_label: service_id
Roadmap
graph import & streaming
- i want to import the graph configuration from different file formats (plantuml, dot..)
- right now the graph configuration is static. This works for a logical representation. But computing environments are very dynamic, so
i want to stream the graph configuration via an API
- do we need a hybrid approach? (cluster per dynamic-api AND static config?)
- which upstream API to spike? How do we determine the edges? kubernetes/$CLOUD?
- can we use traces (L3/4: tcp/udp/ip via eBPF, L7 via opentracing?) to determine the nodes and edges?
further customization
- as a user i want to cross-reference other services (e.g. grafana) from the tooltip (e.g. link to dashboard, runbook etc.)
TODO
- add direction arrow to edge
- highlight adjacent nodes & edges
- graph-config library
- implement config library with shapes, consider using draw.io shapes (AWS/GCP..)
- Misc. optimizations
- metrics & alerts caching
- decouple client and upstream requests
Developing
Run Server
$ make binary
$ ./bin/statusgraph server --config ./config.yaml
Run Test Infra
$ cd hack
$ docker-compose up -d
# test failure
$ docker-compose stop cart.svc
Run Client
$ cd client; npm install; npm run watch
You can access prometheus via localhost:9090
, alertmanager via localhost:9093
and the backend (which serves the SPA too) via localhost:8000
.