Categorygithub.com/intel/powertelemetry
repositorypackage
1.0.1
Repository: https://github.com/intel/powertelemetry.git
Documentation: pkg.go.dev

# Packages

No description provided by the author

# README

Power Telemetry Library

powertelemetry is a Golang library that provides functionalities to get power management related metrics for Intel processors.

Metrics

Metric types can be distinguished based on the host topology attributes associated with them:

  • CPU metric: Metric value related to a specific logical CPU (CPU ID).
  • Package metric: Metric value related to a specific package ID (socket ID).
  • Die metric: Metric value related to a specific die ID.

The following metrics are supported by Power Telemetry Library:

Metric nameTypeDescriptionUnits
CurrentPackagePowerConsumptionWattsPackageCurrent power consumption of processor package.Watts
CurrentDramPowerConsumptionWattsPackageCurrent power consumption of processor package DRAM subsystem.Watts
PackageThermalDesignPowerWattsPackageMaximum Thermal Design Power (TDP) available for processor package.Watts
MaxTurboFreqListPackageMaximum reachable turbo frequency for number of cores active.MHz
CurrentUncoreFrequencyPackage/DieCurrent uncore frequency for die in processor package. This value is available from intel-uncore-frequency module for kernel >= 5.18. For older kernel versions it needs to be accessed via MSR. In case of lack of loaded msr, value will not be collected.MHz
InitialUncoreFrequencyMinPackage/DieInitial minimum uncore frequency limit for die in processor package.MHz
InitialUncoreFrequencyMaxPackage/DieInitial maximum uncore frequency limit for die in processor package.MHz
CustomizedUncoreFrequencyMinPackage/DieCustomized minimum uncore frequency limit for die in processor package.MHz
CustomizedUncoreFrequencyMaxPackage/DieCustomized maximum uncore frequency limit for die in processor package.MHz
CPUBaseFrequencyPackageCPU Base Frequency (maximum non-turbo frequency) for the processor package.MHz
CPUFrequencyCPUCurrent operational frequency of CPU Core.MHz
CPUC0StateResidencyCPUPercentage of time that CPU Core spent in C0 Core residency state.%
CPUC1StateResidencyCPUPercentage of time that CPU Core spent in C1 Core residency state.%
CPUC3StateResidencyCPUPercentage of time that CPU Core spent in C3 Core residency state.%
CPUC6StateResidencyCPUPercentage of time that CPU Core spent in C6 Core residency state.%
CPUC7StateResidencyCPUPercentage of time that CPU Core spent in C7 Core residency state.%
CPUTemperatureCPUCurrent temperature of CPU Core.degrees Celsius
CPUBusyFrequencyMhzCPUCPU Core Busy Frequency measured as frequency adjusted to CPU Core busy cycles.MHz
CPUC0SubstateC01PercentCPUPercentage of time that CPU Core spent in C0.1 substate out of the total time in the C0 state.%
CPUC0SubstateC02PercentCPUPercentage of time that CPU Core spent in C0.2 substate out of the total time in the C0 state.%
CPUC0SubstateC0WaitPercentCPUPercentage of time that CPU Core spent in C0_Wait substate out of the total time in the C0 state.%

Note: Metrics that report processor C-state residencies or power consumption are calculated over elapsed intervals.

SW Dependencies

Kernel modules

The library is mostly based on Linux Kernel modules that expose specific metrics over sysfs or devfs interfaces. The following dependencies are expected by the library:

  • intel-rapl kernel module which exposes Intel Runtime Power Limiting metrics over sysfs (/sys/devices/virtual/powercap/intel-rapl),
  • msr kernel module that provides access to processor model specific registers over devfs (/dev/cpu/cpu%d/msr),
  • cpufreq kernel module - which exposes per-CPU Frequency over sysfs (/sys/devices/system/cpu/cpu%d/cpufreq/scaling_cur_freq),
  • intel-uncore-frequency kernel module which exposes Intel uncore frequency metrics over sysfs (/sys/devices/system/cpu/intel_uncore_frequency).

Make sure that required kernel modules are loaded and running. Modules might have to be manually enabled by using modprobe. Depending on the kernel version, run commands:

# rapl modules:
## kernel < 4.0
sudo modprobe intel_rapl
## kernel >= 4.0
sudo modprobe rapl
sudo modprobe intel_rapl_common
sudo modprobe intel_rapl_msr

# msr module:
sudo modprobe msr

# cpufreq module:
### integrated in kernel 

# intel-uncore-frequency module:
## only for kernel >= 5.6.0
sudo modprobe intel-uncore-frequency

Kernel's perf interface

For perf-related metrics, when the application that uses this library is not running as root, the following capability should be added to the application executable:

sudo setcap cap_sys_admin+ep <path_to_application>

Alternatively, /proc/sys/kernel/perf_event_paranoid has to be set to value less than 1.

Depending on environment and configuration (number of monitored CPUs and number of enabled metrics), it might be required to increase the limit on the number of open file descriptors allowed. This can be done for example by using ulimit -n command.

Dependencies of metrics on system configuration

Details of these dependencies are discussed above:

Metric nameTypeDependency
CurrentPackagePowerConsumptionWattsPackagerapl kernel module(s)
CurrentDramPowerConsumptionWattsPackagerapl kernel module(s)
PackageThermalDesignPowerWattsPackagerapl kernel module(s)
MaxTurboFreqListPackagemsr kernel module
CurrentUncoreFrequencyPackage/Dieintel-uncore-frequency/msr kernel modules*
InitialUncoreFrequencyMinPackage/Dieintel-uncore-frequency kernel module
InitialUncoreFrequencyMaxPackage/Dieintel-uncore-frequency kernel module
CustomizedUncoreFrequencyMinPackage/Dieintel-uncore-frequency kernel module
CustomizedUncoreFrequencyMaxPackage/Dieintel-uncore-frequency kernel module
CPUBaseFrequencyPackage/Diemsr kernel module
CPUFrequencyCPUcpufreq kernel module
CPUC0StateResidencyCPUmsr kernel module
CPUC1StateResidencyCPUmsr kernel module
CPUC3StateResidencyCPUmsr kernel module
CPUC6StateResidencyCPUmsr kernel module
CPUC7StateResidencyCPUmsr kernel module
CPUTemperatureCPUmsr kernel module
CPUBusyFrequencyMhzCPUmsr kernel module
CPUC0SubstateC01PercentCPUkernel's perf interface
CPUC0SubstateC02PercentCPUkernel's perf interface
CPUC0SubstateC0WaitPercentCPUkernel's perf interface

*starting from kernel version 5.18, only the intel-uncore-frequency module is required. For older kernel versions, the metric CurrentUncoreFrequency requires the msr module to be enabled.

Root privileges

The application that uses this library may require root privileges to read all the metrics (depending on OS type or configuration).

Alternatively, the following capabilities can be added to the application executable:

# without perf-related metrics:
sudo setcap cap_sys_rawio,cap_dac_read_search+ep <path_to_application>

# with perf-related metrics:
sudo setcap cap_sys_rawio,cap_dac_read_search,cap_sys_admin+ep <path_to_application>

HW Dependencies

Specific metrics require certain processor features to be present, otherwise this library won't be able to read them. The user can detect supported processor features by reading /proc/cpuinfo file. The library assumes crucial properties are the same for all CPU cores in the system.

The following processor properties are examined in more detail in this section:

  • vendor_id
  • cpu family
  • model
  • flags

The following processor properties are required by the library:

  • Processor vendor_id must be GenuineIntel and cpu family must be 6 - since data used by the library are Intel-specific.
  • The following processor flags shall be present:
    • msr shall be present for the library to read platform data from processor model specific registers and collect the following metrics:
      • CPUC0StateResidency
      • CPUC1StateResidency
      • CPUC3StateResidency
      • CPUC6StateResidency
      • CPUC7StateResidency
      • CPUBusyFrequencyMhz
      • CPUTemperature
      • CPUBaseFrequency
      • MaxTurboFreqList
      • CurrentUncoreFrequency (for kernel < 5.18)
    • aperfmperf shall be present to collect the following metrics:
      • CPUC0StateResidency
      • CPUC1StateResidency
      • CPUBusyFrequencyMhz
    • dts shall be present to collect:
      • CPUTemperature
  • Please consult the table below which metrics among those listed are supported by the host's processor model:
    • CPUC1StateResidency

    • CPUC3StateResidency

    • CPUC6StateResidency

    • CPUC7StateResidency

    • CPUTemperature

    • CPUBaseFrequency

    • CurrentUncoreFrequency

    • InitialUncoreFrequencyMin

    • InitialUncoreFrequencyMax

    • CustomizedUncoreFrequencyMin

    • CustomizedUncoreFrequencyMax

      Model numberProcessor nameCPUC1StateResidency
      CPUC6StateResidency
      CPUTemperature
      CPUBaseFrequency
      CPUC3StateResidencyCPUC7StateResidencyCurrentUncoreFrequency
      InitialUncoreFrequencyMin
      InitialUncoreFrequencyMax
      CustomizedUncoreFrequencyMin
      CustomizedUncoreFrequencyMax
      0x1EIntel Nehalem
      0x1FIntel Nehalem-G
      0x1AIntel Nehalem-EP
      0x2EIntel Nehalem-EX
      0x25Intel Westmere
      0x2CIntel Westmere-EP
      0x2FIntel Westmere-EX
      0x2AIntel Sandybridge
      0x2DIntel Sandybridge-X
      0x3AIntel Ivybridge
      0x3EIntel Ivybridge-X
      0x3CIntel Haswell
      0x3FIntel Haswell-X
      0x45Intel Haswell-L
      0x46Intel Haswell-G
      0x3DIntel Broadwell
      0x47Intel Broadwell-G
      0x4FIntel Broadwell-X
      0x56Intel Broadwell-D
      0x4EIntel Skylake-L
      0x5EIntel Skylake
      0x55Intel Skylake-X
      0x8EIntel KabyLake-L
      0x9EIntel KabyLake
      0xA5Intel CometLake
      0xA6Intel CometLake-L
      0x66Intel CannonLake-L
      0x6AIntel IceLake-X
      0x6CIntel IceLake-D
      0x7DIntel IceLake
      0x7EIntel IceLake-L
      0x9DIntel IceLake-NNPI
      0xA7Intel RocketLake
      0x8CIntel TigerLake-L
      0x8DIntel TigerLake
      0x8FIntel Sapphire Rapids X
      0xCFIntel Emerald Rapids X
      0xADIntel Granite Rapids X
      0x8AIntel Lakefield
      0x97Intel AlderLake
      0x9AIntel AlderLake-L
      0xB7Intel RaptorLake
      0xBAIntel RaptorLake-P
      0xBFIntel RaptorLake-S
      0xACIntel MeteorLake
      0xAAIntel MeteorLake-L
      0xC6Intel ArrowLake
      0xBDIntel LunarLake
      0x37Intel Atom® Bay Trail
      0x4DIntel Atom® Avaton
      0x4AIntel Atom® Merrifield
      0x5AIntel Atom® Moorefield
      0x4CIntel Atom® Airmont
      0x5CIntel Atom® Apollo Lake
      0x5FIntel Atom® Denverton
      0x7AIntel Atom® Goldmont
      0x86Intel Atom® Jacobsville
      0x96Intel Atom® Elkhart Lake
      0x9CIntel Atom® Jasper Lake
      0xBEIntel AlderLake-N
      0xAFIntel Sierra Forest
      0xB6Intel Grand Ridge
      0x57Intel Xeon® PHI Knights Landing
      0x85Intel Xeon® PHI Knights Mill

How to use

Initialization

powertelemetry library implements optional builder pattern. It allows the user to specify configuration parameters, and the dependencies to initalize needed to provide metrics information via WithX exported functions.

ptel, err := New(opts...)

Supported options are:

  • WithCoreFrequency: Option that enables access to metrics which rely on cpufreq kernel module.
  • WithMsr: Option that enables access to metrics which rely on msr kernel module.
  • WithMsrTimeout: Same as WithMsr, but it accepts an additional argument to specify the timeout for MSR reads.
  • WithIncludedCPUs/WithExcludedCPUs: Option that allows to specify which logical CPU ID have access to msr and cpufreq kernel modules and perf_events kernel interface, by inclusion or exclusion. Notice that only one of these options can be used during instantiation. When omitted, all logical CPUs from host topology are accessible.
  • WithRapl: Option that enables access to metrics which rely on rapl kernel module.
  • WithUncoreFrequency: Option that enables access to metrics which rely on intel-uncore-frequency kernel module.
  • WithPerf: Option that enables access to metrics which rely on perf_events kernel interface. It takes the path of a JSON file with perf event definitions specific for the host's CPU model. Files can be found in perfmon repository.
  • WithLogger: The user can provide a custom logger.

Refer to Dependencies of metrics on system configuration section to check which options need to be enabled for each metric.

Example: Initialize PowerTelemetry using included CPUs option

This example uses options to enable all supported metrics. Additionally, uses WithIncludedCPUs to limit the CPU IDs that can be used to retrieve metrics which rely on msr and cpufreq kernel modules.

pt, err := powertelemetry.New(
    powertelemetry.WithIncludedCPUs([]int{0, 1, 2, 3}),
    powertelemetry.WithMsr(),
    powertelemetry.WithCoreFrequency(),
    powertelemetry.WithRapl(),
    powertelemetry.WithUncoreFrequency(),
    powertelemetry.WithPerf("/path/to/events.json"),
)

Example: Initialize PowerTelemetry using excluded CPUs option

This example uses options to enable metrics which rely on msr and cpufreq kernel modules. Additionally, uses WithExcludedCPUs to exclude CPU IDs that can be used to retrieve these metrics.

pt, err := powertelemetry.New(
    powertelemetry.WithExcludedCPUs([]int{0, 1, 2, 3}),
    powertelemetry.WithMsr(),
    powertelemetry.WithCoreFrequency(),
)

Get Metric Values

powertelemetry provides exported methods to get metric values defined in the Metrics section. These methods accept an argument depending on the metric type.

The exported method to get metric values have the following naming convention:

// CPU metric type.
func (ptel *PowerTelemetry) Get<metric_name>(cpuID int) (<value_type>, error)

// Package metric type.
func (ptel *PowerTelemetry) Get<metric_name>(packageID int) (<value_type>, error)

// Package/die metric type.
func (ptel *PowerTelemetry) Get<metric_name>(packageID, dieID int) (<value_type>, error)

Where:

  • metric_name corresponds to the metric name of the supported Metrics section.
  • value_type can be either float64 or uint64, depending on the metric.

There are several types of metrics depending on how values are calculated:

  • Instantaneous: The metric value corresponds to the time the specific instant in which the method is called.
  • Elapsed interval: The metric value corresponds to a specific time interval.

Instantaneous Metrics

Following are the metrics that provide instantaneous values:

  • CPUTemperature
  • CPUFrequency
  • CPUBaseFrequency
  • CurrentUncoreFrequency
  • InitialUncoreFrequencyMin
  • InitialUncoreFrequencyMax
  • CustomizedUncoreFrequencyMin
  • CustomizedUncoreFrequencyMax
  • MaxTurboFreqList
  • PackageThermalDesignPowerWatts

Example: Get the instantaneous value of CPU temperature metric

// CPU temperature metric
cpuTemp, err := pt.GetCPUTemperature(cpuID)
if err != nil {
  // handle error
}

Elapsed Interval Metrics

The following metrics depend on elapsed intervals:

  • Metrics that rely on msr:
    • CPUC0StateResidency
    • CPUC1StateResidency
    • CPUC3StateResidency
    • CPUC6StateResidency
    • CPUC7StateResidency
    • CPUBusyFrequencyMhz
  • Metrics that rely on perf:
    • CPUC0SubstateC01Percent
    • CPUC0SubstateC02Percent
    • CPUC0SubstateC0WaitPercent
  • Metrics that rely on rapl:
    • CurrentPackagePowerConsumptionWatts
    • CurrentDramPowerConsumptionWatts

The elapsed time interval is automatically calculated between subsequent calls to retrieve metric values. It is recommended to use a scheduler to consistently retrieve metrics over a fixed time interval.

Metrics relying on rapl

The following example shows how to retrieve the value of power related metrics based on rapl kernel module.

Example: Get metrics which rely on rapl kernel module

// First read of metrics at init (t0).
ptel, err := ptel.New(WithRapl())
if err != nil {
  // handle error
}

// Method call at t1. Metric value corresponds to time interval t1-t0.
powerInterval1, err := pt.GetCurrentPackagePowerConsumptionWatts(packageID)
if err != nil {
  // handle error
}

// Method call at t2. Metric value corresponds to time interval t2-t1.
powerInterval2, err := pt.GetCurrentPackagePowerConsumptionWatts(packageID)
if err != nil {
  // handle error
}

Note: The first metric reading operation happens at initialization, when WithRaploption is present.

Metrics relying on msr

C-state residency metrics need an additional method call to UpdatePerCPUMetrics that reads all required offsets of the corresponding MSR registers prior to providing their values.

Example: Get time-elapsed metrics which rely on msr kernel module

// First offset reading of MSR registers (call to UpdatePerCPUMetrics) at init (t0).
ptel, err := ptel.New(WithMsr())
if err != nil {
  // handle error
}

cpuID := 0

// Method call at t1. Elapsed time is calculated from current and previous call to UpdatePerCPUMetrics, t1-t0.
if err := ptel.UpdatePerCPUMetrics(cpuID); err != nil {
  // handle error
  return
}

// Get CPUC0StateResidency corresponding to previous elapsed interval.
c0State, err := ptel.GetCPUC0StateResidency(cpuID)
if err != nil {
  // handle error
}

// Get CPUC1StateResidency corresponding to previous elapsed interval.
c1State, err := ptel.GetCPUC1StateResidency(cpuID)
if err != nil {
  // handle error
}

Note: The first reading operations of the MSR register happen at initialization, when WithMsr or WithMsrTimeout options are present.

Metrics relying on perf

C0-substate metrics need an additional ReadPerfEvents method call that reads all required perf events, per-CPU, prior to providing their values.

When an instance of PowerTelemetry has been successfully initialized with the option WithPerf, perf events are activated. This means multiple file descriptors remain open. Therefore, if the user no longer needs to get perf specific metrics these resources need to be released, via DeactivatePerfEvents method call.

Note: Event activation happens at initialization, when WithPerf option is added.

Example: Get time-elapsed metrics which rely on perf kernel interface

// Read events related to perf-related metrics.
// Elapsed time is calculated from current and previous call to ReadPerfEvents.
if err := ptel.ReadPerfEvents(); err != nil {
  // handle error
  return
}

// Get GetCPUC0SubstateC01Percent corresponding to previous elapsed interval.
c0SubstateC01, err := ptel.GetCPUC0SubstateC01Percent(cpuID)
if err != nil {
  // handle error
}

// Get GetCPUC0SubstateC02Percent corresponding to previous elapsed interval.
c0SubstateC02, err := ptel.GetCPUC0SubstateC02Percent(cpuID)
if err != nil {
  // handle error
}

// Get GetCPUC0SubstateC0WaitPercent corresponding to previous elapsed interval.
c0SubstateC0Wait, err := ptel.GetCPUC0SubstateC0WaitPercent(cpuID)
if err != nil {
  // handle error
}

// Release resources. Close file descriptors.
err := ptel.DeactivatePerfEvents()
if err != nil {
  // handle error
}

Error Handling

This library exposes several types of errors, providing the user the flexibility to handle them differently:

  • MultiError: Holds a slice of error descriptions. It is used to mark errors that happened during the initialization of PowerTelemetry dependencies.
  • ModuleNotInitializedError: Used to indicate that a dependency has not been initialized, and the user tried to access it.
  • MetricNotSupportedError: Used to indicate that a metric is not supported by the host's CPU model.

Usage of MultiError

When creating a new PowerTelemetry instance, the New function returns a MultiError if any of the dependencies, requested via options, failed to initialize.

ptel, err := powertelemetry.New(
  powertelemetry.WithMsr(),
  powertelemetry.WithRapl(),
  powertelemetry.WithCoreFrequency(),
  powertelemetry.WithUncoreFrequency(),
  powertelemetry.WithPerf("/path/to/events.json"),
)

var initErr *powertelemetry.MultiError
if err != nil {
  if !errors.As(err, &initErr) {
    logger.Errorf("Failed to build powertelemetry instance: %v", err)
    os.Exit(1)
  }
  logger.Warn(err)
}

Typical scenarios that return a MultiError type would be:

  • Requesting a dependency, but the corresponding kernel module was not loaded previously.
  • Provide invalid paths to WithX options that allow to specify custom path used to initialize the dependencies.
  • Provide invalid JSON file for perf event definitions via WithPerf.

Usage of ModuleNotInitializedError

Calls to a get metric method return an error of type ModuleNotInitializedError if the dependency to which it relies on, has not been initialized via the corresponding WithX option, or it failed to initialize.

This might be used to:

  • Prevent subsequent calls to metrics relying on the same dependency.
  • Prevent subsequent calls to the same metric getter, when looping through multiple logical CPU IDs or package/die IDs.
Example: ModuleNotInitializedError error handling for CPU frequency metric
// CPU current frequency metric
cpuID := 0
cpuFreq, err := ptel.GetCPUFrequency(cpuID)

var moduleErr *powertelemetry.ModuleNotInitializedError
if err != nil {
  if !errors.As(err, &moduleErr) {
    // Handle module not initialized error
  }
  // Handle other error types
}

Usage of MetricNotSupportedError

As mentioned in HW Dependencies section, specific metrics require certain processor features to be present, or specific processor models.

powertelemetry library provides exported functions that allow the user to check if a metric is supported by the CPU. If not supported, a MetricNotSupportedError is returned, for the user to handle it.

The exported functions have the following naming convention:

func CheckIf<metric_name>Supported(cpuModel int) error

This might be used to disable metric requests based on CPU model compatibility.

Example: MetricNotSupportedError error handling for CPU C1 state residency metric
var notSupportedErr *powertelemetry.MetricNotSupportedError

err := ptel.CheckIfCPUC1StateResidencySupported(cpuModel)
if err != nil && errors.As(err, &notSupportedErr)
  // handle not supported metric error
}