package
0.0.0-20190411182844-89f6948e2457
Repository: https://github.com/kubernetes-retired/contrib.git
Documentation: pkg.go.dev

# README

Kubernetes Node Performance Dashboard

Node Performance Dashboard (node-perf-dash) is a web UI to collect and analyze performance test results of Kubernetes nodes. It collects data from Kubernetes node e2e performance tests, which can be stored either in local FS or Google GCS, then visualizes the data in 4 dashboards:

  • Builds:monitoring performance change over different builds
  • Comparison: compare performance change with different test parameters (e.g. pod number, creation speed, machine type)
  • Time series: time series data including operation tracing probes inside kubernetes and resource-usage change over time
  • Tracing: plot the latency percentile between any two tracing probes over different build

Node-Perf-Dash is running and available at http://node-perf-dash.k8s.io/

Getting Started

Build node-perf-dash:

make node-perf-dash

Collect data from Google GCS:

node-perf-dash --address=0.0.0.0:808 --builds=20 --tracing=true --datasource=google-gcs

Collect data from local test data:

node-perf-dash --address=0.0.0.0:808 --builds=20 --tracing=true --datasource=local --local-data-dir=$MY_TEST_RESULT_PATH

The test result must have the following directory structure:

$MY_TEST_RESULT_PATH/
  latest-build.txt
  build_nr_1/
      build-log.txt
      artifacts/
          test_machine_host_name1/
              kubelet.log
          test_machine_host_name2
          ...
  build_nr_N
  ...

Dashboards

Builds

You display the desired data by selecting

  • Job: select the test project (e.g. ci-kubernetes-node-kubelet-benchmark)
  • test: display data for a test by selecting the short test name, or selecting test options one by one
  • image/machine: select from the available images and machine type (capacity in format cpu:1core,memory:3.5G)
  • build: periodic benchmark tests are running with incremental build number, node-perf-dash collects latest test data using total build count specified by --builds, you can change the range of builds in dashboar (see https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/jenkins/benchmark/benchmark-config.yam)

Resource usage (CPU/memory of kubelet/runtime) will be displayed. Pod startup latency and creation throughput will be displayed for density test. (see https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/density_test.go)

Comparison

To compare node performance among different tests, click COMPARE IT button in the right upper corner on the build page. The test will be added to the comparison list in the comparison page. Click LOAD to see the comparison in bar charts (data are averaged over the selected build range).

Time Series

Analyzing time series data are useful to drill into node performance issues. The page contains the operation tracing data both from test and kubelet operations. It also shows the resource usage of kubelet and runtime changing with time during the test.

The tracing inside kubelet is done by parsing the log of kubelet. It contains important information such as when kubelet SyncLoop detects pod configuration change, when a pod is running, and when kubelet status manager reports pod status change to the API server. In future we plan to use Event as a fixed format of tracing instead of using random logs. See https://github.com/kubernetes/kubernetes/pull/31583 for more details.

Tracing

Interested in knowing the latency distribution between any two operations? You can select two operations shown in the time series page (probes) and see the latency percentiles. (it does not match operations for the same pod, instead simply assumes all operations happen in order)

# Functions

GrabTracingKubelet parse tracing data using kubelet.log.
NewGoogleGCSDownloader creates a new GoogleGCSDownloader.
NewLocalDownloader creates a new LocalDownloader.
Parse fetches data from the source and populates allTestData and testInfo for the given test job.
ParseKubeletLog calls GrabTracingKubelet to parse tracing data while using test end time from build-log.txt to separate tests.

# Structs

DataPerBuild contains perf/time series data for a build.
DataPerTest contains perf/time series data for a test.
DetectedEntry records a parsed tracing event and timestamp.
GoogleGCSDownloader gets test data from Google Cloud Storage.
LocalDownloader gets test data from local files.
PodState records the state of a pod from parsed kubelet log.
TestInfo contains the mapping from test name to test description.
TestTimeRange contains test name and its end time.
TracingData contains the tracing data of a test on a node in the format of time series data.

# Interfaces

Downloader is the interface that connects to a data source.

# Type aliases

DataPerNode contains perf/time series data for a node.
JobList is the list containing all Jenkins projects.
SortedTestTime is a sorted list of TestTimeRange.
SortedTestTimePerNode records SortedTestTime for each node.
TestTime records the end time for one test on one node.
TestToBuildData is a map from job name to DataPerTest.