Categorygithub.com/litmuschaos/chaos-exporter
repository
0.0.0-20241111081245-18ff9869ca3c
Repository: https://github.com/litmuschaos/chaos-exporter.git
Documentation: pkg.go.dev

# Packages

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author

# README

Litmus Chaos Exporter

Slack Channel GitHub Workflow Docker Pulls GitHub issues Twitter Follow CII Best Practices Go Report Card FOSSA Status YouTube Channel

  • This is a custom Prometheus and CloudWatch exporter to expose Litmus Chaos metrics. To learn more about Litmus Chaos Experiments & the Litmus Chaos Operator, visit this link: Litmus Docs

  • Typically deployed along with the chaos-operator deployment, which, in-turn is associated with all chaosresults in the cluster.

  • Two types of metrics are exposed:

    • AggregateMetrics: These metrics are derived from the all the chaosresults present inside WATCH_NAMESPACE. If WATCH_NAMESPACE is not defined then it derived metrics from all namespaces. It exposes total_passed_experiment, total_failed_experiment, total_awaited_experiment, experiment_run_count, experiment_installed_count metrices.

    • ExperimentScoped: Individual experiment run status. It exposes passed_experiment, failed_experiment, awaited_experiment, result_verdict,probe_success_percentage, startTime, endTime, totalDuration, chaosInjectTime metrices.

ExperimentScoped Metrics

Metrics Namelitmuschaos_passed_experiments
DescriptionIt contains total number of passed experiments
SourceChaosResult
Sample Metricslitmuschaos_passed_experiments{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1
NotesThe litmuschaos_passed_experiments contains the cumulative sum of passed runs for the given ChaosResult.
Metrics Namelitmuschaos_failed_experiments
DescriptionIt contains total number of failed experiments
SourceChaosResult
Sample Metricslitmuschaos_failed_experiments{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 0
NotesThe litmuschaos_failed_experiments contains the cumulative sum of failed runs for the given ChaosResult.
Metrics Namelitmuschaos_awaited_experiments
DescriptionIt contains total number of awaited experiments
SourceChaosResult
Sample Metricslitmuschaos_awaited_experiments{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1
NotesThe litmuschaos_awaited_experiments denotes the queued experiments for each ChaosResult. It contains the value as 1 if the ChaosResult's verdict is Awaited otherwise it's value is 0.
Metrics Namelitmuschaos_probe_success_percentage
DescriptionIt contains the ProbeSuccessPercentage for the experiment
SourceChaosResult
Sample Metricslitmuschaos_probe_success_percentage{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 100
NotesThe litmuschaos_probe_success_percentage defines the percentage of passed probes out of total probes defined inside the ChaosEngine.
Metrics Namelitmuschaos_experiment_start_time
DescriptionIt contains the start time of the experiment
SourceExperimentDependencyCheck event inside the ChaosEngine
Sample Metricslitmuschaos_experiment_start_time{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1.618425155e+09
NotesThe litmuschaos_experiment_start_time denotes the start time of the experiment, which calculated based on the ExperimentDependencyCheck event(created by the chaos-runner just before launching experiment pod).
Metrics Namelitmuschaos_experiment_end_time
DescriptionIt contains the end time of the experiment
SourceSummary event inside the ChaosEngine
Sample Metricslitmuschaos_experiment_end_time{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1.618425219e+09
NotesThe litmuschaos_experiment_end_time denotes the end time of the experiment, which calculated based on the Summary event(created by experiment pod in the end of experiment).
Metrics Namelitmuschaos_experiment_chaos_injected_time
DescriptionIt contains the chaos injection time of the experiment
SourceChaosInject event inside the ChaosEngine
Sample Metricslitmuschaos_experiment_chaos_injected_time{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1.618425199e+09
NotesThe litmuschaos_experiment_chaos_injected_time defines the time duration when chaos is actually injected, which calculated based on the ChaosInject event(created by the experiment/helper pod just before chaos injection).
Metrics Namelitmuschaos_experiment_total_duration
DescriptionIt contains the total chaos duration of the experiment
SourceIt is time difference b/w startTime and endTime
Sample Metricslitmuschaos_experiment_total_duration{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 64
NotesThe litmuschaos_experiment_total_duration defines the total chaos duration of the experiment. It is time interval betweeen start time and the end time.
Metrics Namelitmuschaos_experiment_verdict
DescriptionIt contains the experiment verdict details
SourceChaosResult
Sample Metricslitmuschaos_experiment_verdict{app_kind="deployment",app_label="run=nginx",app_namespace="nginx",chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus",chaosresult_verdict="Pass",probe_success_percentage="100.000000"} 1
NotesThe litmuschaos_experiment_verdict sets the metrics based on the ChaosResult verdict. In case of Awaited verdict it always set to 0. In case of other verdicts it contains value as 1. But if the verdict is repeated more than TSDB_SCRAPE_INTERVAL(passed as ENV) then it will set to 0 until verdict change to a different value.

NamespacedScoped Metrics

Metrics Namelitmuschaos_namespace_scoped_passed_experiments
DescriptionIt contains the total passed experiments count in the WATCH_NAMESPACE
SourceAggregated sum of all the litmuschaos_passed_experiments metrics derived from the ChaosResult present inside WATCH_NAMESPACE
Sample Metricslitmuschaos_namespace_scoped_passed_experiments 2
NotesThe litmuschaos_namespace_scoped_passed_experiments defines the total number of passed experiments in the WATCH_NAMESPACE. It is the summation of litmuschaos_passed_experiments metrics for every ChaosResult present inside the WATCH_NAMESPACE.
Metrics Namelitmuschaos_namespace_scoped_failed_experiments
DescriptionIt contains the total failed experiments count in the WATCH_NAMESPACE
SourceAggregated sum of all the litmuschaos_failed_experiments metrics derived from the ChaosResult present inside WATCH_NAMESPACE
Sample Metricslitmuschaos_namespace_scoped_failed_experiments 0
NotesThe litmuschaos_namespace_scoped_failed_experiments defines the total number of failed experiments in the WATCH_NAMESPACE. It is the summation of litmuschaos_failed_experiments metrics for every ChaosResult present inside the WATCH_NAMESPACE.
Metrics Namelitmuschaos_namespace_scoped_awaited_experiments
DescriptionIt contains the total awaited experiments count in the WATCH_NAMESPACE
SourceAggregated sum of all the litmuschaos_awaited_experiments metrics derived from the ChaosResult present inside WATCH_NAMESPACE
Sample Metricslitmuschaos_namespace_scoped_awaited_experiments 0
NotesThe litmuschaos_namespace_scoped_awaited_experiments defines the total number of awaited/queued experiments in the WATCH_NAMESPACE. It is the summation of litmuschaos_awaited_experiments metrics for every ChaosResult present inside the WATCH_NAMESPACE.
Metrics Namelitmuschaos_namespace_scoped_experiments_run_count
DescriptionIt contains the total experiments run count in the WATCH_NAMESPACE
SourceAggregated sum of all the experiments runs in the WATCH_NAMESPACE
Sample Metricslitmuschaos_namespace_scoped_experiments_run_count 2
NotesThe litmuschaos_namespace_scoped_experiments_run_count defines the total experiment runs in the WATCH_NAMESPACE. It is summation of litmuschaos_passed_experiments + litmuschaos_failed_experiments + litmuschaos_awaited_experiments for every ChaosResult present present inside the WATCH_NAMESPACE.
Metrics Namelitmuschaos_namespace_scoped_experiments_installed_count
DescriptionIt contains the total unique experiments installed/run in the WATCH_NAMESPACE
SourceIt contains total unique experiments count in the WATCH_NAMESPACE
Sample Metricslitmuschaos_namespace_scoped_experiments_installed_count 1
NotesThe litmuschaos_namespace_scoped_experiments_installed_count defines the total unique experiments installed/run in the WATCH_NAMESPACE. It is equal to the total number of ChaosResult present inside the WATCH_NAMESPACE.

ClusterScoped Metrics

Metrics Namelitmuschaos_cluster_scoped_passed_experiments
DescriptionIt contains the total passed experiments count in all the namespaces
SourceAggregated sum of all the litmuschaos_passed_experiments metrics derived from the ChaosResult present inside all the namespaces
Sample Metricslitmuschaos_cluster_scoped_passed_experiments 2
NotesThe litmuschaos_cluster_scoped_passed_experiments defines the total number of passed experiments across the cluster. It is the summation of litmuschaos_passed_experiments metrics for every ChaosResult in all the namespaces.
Metrics Namelitmuschaos_cluster_scoped_failed_experiments
DescriptionIt contains the total failed experiments count in all the namespaces
SourceAggregated sum of all the litmuschaos_failed_experiments metrics derived from the ChaosResult present inside all the namespaces
Sample Metricslitmuschaos_cluster_scoped_failed_experiments 0
NotesThe litmuschaos_cluster_scoped_failed_experiments defines the total number of failed experiments across the cluster. It is the summation of litmuschaos_failed_experiments metrics for every ChaosResult in all the namespaces.
Metrics Namelitmuschaos_cluster_scoped_awaited_experiments
DescriptionIt contains the total awaited experiments count in all the namespaces
SourceAggregated sum of all the litmuschaos_awaited_experiments metrics derived from the ChaosResult present inside all the namespaces
Sample Metricslitmuschaos_cluster_scoped_awaited_experiments 0
NotesThe litmuschaos_cluster_scoped_awaited_experiments defines the total number of awaited/queued experiments across the cluster. It is the summation of litmuschaos_awaited_experiments metrics for every ChaosResult in all the namespaces.
Metrics Namelitmuschaos_cluster_scoped_experiments_run_count
DescriptionIt contains the total experiments run count in all the namespaces
SourceAggregated sum of all the experiments runs in all the namespaces
Sample Metricslitmuschaos_cluster_scoped_experiments_run_count 2
NotesThe litmuschaos_cluster_scoped_experiments_run_count defines the total experiment runs across the cluster. It is summation of litmuschaos_passed_experiments + litmuschaos_failed_experiments + litmuschaos_awaited_experiments for every ChaosResult present inside all the namespaces.
Metrics Namelitmuschaos_cluster_scoped_experiments_installed_count
DescriptionIt contains the total unique experiments installed/run in all the namespaces
SourceIt contains total unique experiments count in all the namespaces
Sample Metricslitmuschaos_cluster_scoped_experiments_installed_count 1
NotesThe litmuschaos_cluster_scoped_experiments_installed_count defines the total unique experiments installed/run across the cluster. It is equal to the total number of ChaosResult present inside all the namespaces.

Steps to build & deploy:

Running Litmus Chaos Experiments in order to generate metrics

  • Follow the steps described here to run litmus chaos experiments which stores the chaos results. The chaos custom resources(chaosresult and chaosengine) are used by the exporter to generate metrics.

Running Chaos Exporter on the local Machine

  • Run the exporter container (litmuschaos/chaos-exporter:ci) on host network. It is necessary to mount the kubeconfig & override entrypoint w/ ./exporter -kubeconfig <path>

  • Execute curl 127.0.0.1:8080/metrics to view metrics

Running Chaos Exporter as a deployment on the Kubernetes Cluster

  • Install the RBAC (serviceaccount, role, rolebinding) as per deploy/rbac.md

  • Deploy the chaos-exporter.yaml

  • From a cluster node, execute curl <exporter-service-ip>:8080/metrics

Example Metrics

# HELP litmuschaos_awaited_experiments Total number of awaited experiments
# TYPE litmuschaos_awaited_experiments gauge
litmuschaos_awaited_experiments{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 0
# HELP litmuschaos_cluster_scoped_awaited_experiments Total number of awaited experiments in all namespaces
# TYPE litmuschaos_cluster_scoped_awaited_experiments gauge
litmuschaos_cluster_scoped_awaited_experiments 0
# HELP litmuschaos_cluster_scoped_experiments_installed_count Total number of experiments in all namespaces
# TYPE litmuschaos_cluster_scoped_experiments_installed_count gauge
litmuschaos_cluster_scoped_experiments_installed_count 1
# HELP litmuschaos_cluster_scoped_experiments_run_count Total experiments run in all namespaces
# TYPE litmuschaos_cluster_scoped_experiments_run_count gauge
litmuschaos_cluster_scoped_experiments_run_count 2
# HELP litmuschaos_cluster_scoped_failed_experiments Total number of failed experiments in all namespaces
# TYPE litmuschaos_cluster_scoped_failed_experiments gauge
litmuschaos_cluster_scoped_failed_experiments 0
# HELP litmuschaos_cluster_scoped_passed_experiments Total number of passed experiments in all namespaces
# TYPE litmuschaos_cluster_scoped_passed_experiments gauge
litmuschaos_cluster_scoped_passed_experiments 2
# HELP litmuschaos_experiment_chaos_injected_time chaos injected time of the experiments
# TYPE litmuschaos_experiment_chaos_injected_time gauge
litmuschaos_experiment_chaos_injected_time{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1.618426086e+09
# HELP litmuschaos_experiment_end_time end time of the experiments
# TYPE litmuschaos_experiment_end_time gauge
litmuschaos_experiment_end_time{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1.618426108e+09
# HELP litmuschaos_experiment_start_time start time of the experiments
# TYPE litmuschaos_experiment_start_time gauge
litmuschaos_experiment_start_time{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 1.618426056e+09
# HELP litmuschaos_failed_experiments Total number of failed experiments
# TYPE litmuschaos_failed_experiments gauge
litmuschaos_failed_experiments{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 0
# HELP litmuschaos_passed_experiments Total number of passed experiments
# TYPE litmuschaos_passed_experiments gauge
litmuschaos_passed_experiments{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 2
# HELP litmuschaos_probe_success_percentage ProbeSuccesPercentage for the experiments
# TYPE litmuschaos_probe_success_percentage gauge
litmuschaos_probe_success_percentage{chaosengine_context="test",chaosengine_name="helloservice-pod-delete",chaosresult_name="helloservice-pod-delete-pod-delete",chaosresult_namespace="litmus"} 100

How do I contribute?

You can contribute by raising issues, improving the documentation, contributing to the core framework and tooling, etc.

Head over to the Contribution guide

License

FOSSA Status