package
0.3.8
Repository: https://github.com/leptonai/gpud.git
Documentation: pkg.go.dev

# Packages

Package clockeventsstate provides the persistent storage layer for the nvidia query results.
Package fabricmanagerlog implements the fabric manager log poller.
Package infiniband provides utilities to query infiniband status.
No description provided by the author
Package nccl contains the implementation of the NCCL (NVIDIA Collective Communications Library) query for NVIDIA GPUs.
Package nvml implements the NVIDIA Management Library (NVML) interface.
Package peermem contains the implementation of the peermem query for NVIDIA GPUs.
Package sxid provides the NVIDIA SXID error details.
Package xid provides the NVIDIA XID error details.
Package xidsxidstate provides the persistent storage layer for the nvidia query results.

# Functions

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
ref.
Get all nvidia component queries.
No description provided by the author
Returns the latest fabric manager output using journalctl.
GetMemoryErrorManagementCapabilities returns the GPU memory error management capabilities based on the GPU product name.
Make sure to call this with a timeout, as a broken GPU may block the command.
No description provided by the author
Returns true if the local machine has NVIDIA GPUs installed.
"NVIDIA Xid 79: GPU has fallen off the bus" may fail this syscall with: "error getting device handle for index '6': Unknown Error" or "Unable to determine the device handle for GPU0000:CB:00.0: Unknown Error".
Lists all PCI devices that are compatible with NVIDIA.
Loads the product name of the NVIDIA GPU device.
Decodes the "nvidia-smi --query" output.
No description provided by the author
only set once since it relies on the kube client and specific port.
Returns true if the local machine runs on Nvidia GPU by running "nvidia-smi".

# Constants

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author

# Structs

No description provided by the author
Contains information about the GPU's memory error management capabilities.
GPU object from the nvidia-smi query.
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
If any field shows "Unknown Error", it means GPU has some issues.
Represents the current nvidia status using "nvidia-smi --query", "nvidia-smi", etc.
No description provided by the author
No description provided by the author