# README
NVIDIA device scheduler extender for Kubernetes
English | 简体中文
Table of Contents
- Introduction
- Features and Components
- Prerequisites
- Quick Start
- Building and Running Locally
- Versioning
Introduction
With the help of NVIDIA device plugin for Kubernetes and kubernetes kubelet deviceplugin manager, we can schedule our pod by gpu numbers. But in some case, our node have more gpu devices with different model, we wish kubernetes to shcedule the pod (need 2 gpu with model x) to the nodes which satisfied it. nvidia-gpu-scheduler helps to achieve it and also helps to monitor pods used differnet gpus and gpuinfos of each node.
Features and Components
Features
- Real-time data acquisition.(Data will be published in time no matter the gpuserver is restart or the gpuserver-ds of each node is restarted.)
- Health check in time. (the gpunode-lifecycle-controller in gpuserver check the health of each node in time with the fresh lease from the gpuserver-ds.)
- Schedule ExtendPoint Filter,Score,Preempt.(Filter nodes with annotation
nvidia-gpu-scheduler/gpu.model
of requested pod, scores by gpu numbers of the request model in each node.)
Components
The NVIDIA device scheduler extender for Kubernetes contains a StatefulSet (gpuserver) and a Daemonset (gpuserver-ds):
gpuserver
Provide following apis to help monitor gpu pod and gpu node info:
- GET /apis/nvidia-gpu-scheduler/v1/gpupods?watch=true
- GET /apis/nvidia-gpu-scheduler/v1/gpunodes?watch=true
Provide following apis to help extend kubernetes kube-scheduler as a HTTPExtender:
- POST /apis/nvidia-gpu-scheduler/v1/schedule/filter
- POST /apis/nvidia-gpu-scheduler/v1/schedule/prioritize
- POST /apis/nvidia-gpu-scheduler/v1/schedule/preempt
gpuserver-ds
Populate node gpu devices info to gpuserver.
- It gets pods used gpu device infos with the help of kubelet grpc Server PodResourcesServer
- It gets gpu device infos with the help of NVML.
Please note that: You needn't have to do the following extensions when making sure each of your cluster node have only one type of gpu model. If you have more than one type of gpu device in your kubelet node. In order to make the pod scheduled to the kubelet get gpu with model it needs, the following tow need to be changed additionally.
- The original kubernetes kubelet component is not support to shcedule pod with different gpu model, we need to change it.
- The original NVIDIA device plugin for Kubernetes need to be changed, to add gpu model info to kubelet via changing the kubelet deviceplugin API.
Prerequisites
The list of prerequisites for running the NVIDIA device scheduler extender described below:
- NVIDIA device plugin for Kubernetes.
- Kubernetes >= v1.13 (gpuserver-ds get pod gpu info base on kubelet podresources API.)
Quick Start
-
Build with docker.
$ make all REGISTRY=docker.io/<yourname>
-
Add an extender configuration to kubernetes kube-scheduler config file.
$ cat kube-scheduler-config.yaml
apiVersion: kubescheduler.config.k8s.io/v1alpha2
...
extenders:
- urlPrefix: 'https://<kube-apiserver>:6443/apis/nvidia-gpu-scheduler/v1/schedule'
filterVerb: filter
prioritizeVerb: prioritize
preemptVerb: preempt
weight: 1
enableHttps: true
nodeCacheCapable: true
ignorable: true
TLSConfig:
CAFile: /etc/kubernetes/ssl/ca.pem
CertFile: /etc/kubernetes/ssl/admin.pem
KeyFile: /etc/kubernetes/ssl/admin-key.pem
profiles:
- schedulerName: default-scheduler
-
Deploy with
helm
Current version of nvidia-gpu-scheduler
is v0.2.0
.
The preferred way to deploy it is using helm
.
Instructions for installing helm
can be found here.
The simple guide for helm with nvidia-gpu-scheduler repo
can be found here
- Add and Update chart repo
# helm repo add ngs https://caden2016.github.io/nvidia-gpu-scheduler
# helm repo update
- Install from chart repo,xxx is the release name. nodeinfo=gpu is the label of gpu node, where to deploy gpuserver-ds.
# helm install xxx ngs/nvidia-gpu-scheduler --version 0.2.0 --namespace kube-system --set nodeSelectorDaemonSet.nodeinfo=gpu
# helm list --namespace kube-system
Building and Running Locally
Versioning
Use the versioning to follow SEMVER. The first version following this scheme has been tagged v0.0.0
.
Going forward, the major version of the nvidia-gpu-scheduler
will only change
following a change in the kubelet podresources API itself.
For example, version v1alpha1
of kubelet podresources API
corresponds to version v0.x.x
of nvidia-gpu-scheduler
.
If a new v2beta2
version of kubelet podresources API
comes out, then nvidia-gpu-scheduler
will increase its major version to 1.x.x
.
As of now, the podresources API for Kubernetes >= v1.13 is v1alpha1
or v1
added compatibly. If you
have a version of Kubernetes >= 1.13 you can deploy any nvidia-gpu-scheduler
version >
v0.0.0
.