Categorygithub.com/vrutkovs/audit-span
repositorypackage
0.0.0-20250107134906-b8b5f745f1fc
Repository: https://github.com/vrutkovs/audit-span.git
Documentation: pkg.go.dev

# README

Audit Log to Traces

This app parses audit log, converts them into traces and sends it to Jaeger-compatible storage. Grafana is used as a tool to query and find strange traces.

Along with Tempo the tool also sends data to Loki, so that audit data could be converted into metrics or properly grepped.

Howto

Start grafana stack:

podman play k8s grafana-stack.yaml

Start the app and pass it the URL of the auditlogs:

go run -mod vendor . --otlp-addr=localhost:4317 --prow-job=https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-upgrade/1835770305428066304

Open http://localhost:3000 in browser (default login is admin/admin) and use Explore view to lookup audit log

overview span view

Apart from Prow URL the app can work on already extracted audit log dir:

go run -mod vendor . --otlp-addr=localhost:4317 --audit-log-dir=/tmp/audit-span696690513

Useful queries

List all failed requests

{filename=~".+"} | json | stage = "ResponseComplete" | responseStatus_code = "500"

Most popular user agents and endpoints throwing errors:

sum by (verb, userAgent, requestURI) (rate({filename=~".+"} | json  | __error__="" | stage ="ResponseComplete" | responseStatus_code = 500[1m]))

Show responses that took over a second for etcd to process:

count (rate({filename=~".+"} | json | stage = "ResponseComplete" | annotations_apiserver_latency_k8s_io_etcd != "" | annotations_apiserver_latency_k8s_io_etcd > 1s | unwrap duration(annotations_apiserver_latency_k8s_io_etcd)[1s]))

Time taken by etcd to process user requests by username:

count by (userAgent, user_username) (rate({filename=~".+"} | json | stage = "ResponseComplete" | annotations_apiserver_latency_k8s_io_etcd != "" | unwrap duration(annotations_apiserver_latency_k8s_io_etcd)[1s]))

Failing requests by URI and user:

count by (requestURI, userAgent, user_username) (rate({filename=~".+"} | json | stage = "ResponseComplete" | objectRef_apiGroup !~ "(authentication|coordination|authorization).k8s.io" | responseStatus_status="Failure"[1s]))

Usernames issuing most mutating requests:

sum by (user_username) (rate({filename=~".+"} | json | verb =~ "(update|delete|create|patch)" | stage = "ResponseComplete" | responseStatus_code=~"2.+"[1m]))

Most actively mutated resource types:

topk(20, sum by (objectRef_resource, objectRef_namespace, objectRef_name) (rate({filename=~".+"} | json | verb =~ "(update|delete|create|patch)" | stage = "ResponseComplete" | responseStatus_code=~"2.+"[1m])))