package
0.0.0-20250403192851-34a345b3f333
Repository: https://github.com/openshift/ci-tools.git
Documentation: pkg.go.dev

# README

gpu-scheduling-webhook

Motivation

Our clusters host some nodes that feature an Nvidia GPU. They are expensive to run workload on so by using this mutating webhook we ensure that only the pods requesting a GPU actually run on those nodes, leaving out everything else.

How it works

A node that features an Nvida GPU holds the following taint:

taints:
- effect: NoSchedule
  key: nvidia.com/gpu
  value: "true"

The webhook inspects a pod's container requests, both form the init containers and regular ones, and apply this toleration:

tolerations:
- key: nvidia.com/gpu
  operator: Equal
  value: "true"
  effect: NoSchedule

when it finds either such a request:

requests:
  nvidia.com/gpu: <SOME_VALUE_HERE>

or the following limit:

limits:
  nvidia.com/gpu: <SOME_VALUE_HERE>

# Functions

No description provided by the author