NVIDIA device scheduler extender for Kubernetes

Introduction
Features and Components
Prerequisites
Quick Start
Building and Running Locally
Versioning

Introduction

With the help of NVIDIA device plugin for Kubernetes and kubernetes kubelet deviceplugin manager, we can schedule our pod by gpu numbers. But in some case, our node have more gpu devices with different model, we wish kubernetes to shcedule the pod (need 2 gpu with model x) to the nodes which satisfied it. nvidia-gpu-scheduler helps to achieve it and also helps to monitor pods used differnet gpus and gpuinfos of each node.

Features and Components

Features

Real-time data acquisition.(Data will be published in time no matter the gpuserver is restart or the gpuserver-ds of each node is restarted.)
Health check in time. (the gpunode-lifecycle-controller in gpuserver check the health of each node in time with the fresh lease from the gpuserver-ds.)
Schedule ExtendPoint Filter,Score,Preempt.(Filter nodes with annotation nvidia-gpu-scheduler/gpu.model of requested pod, scores by gpu numbers of the request model in each node.)

Components

The NVIDIA device scheduler extender for Kubernetes contains a StatefulSet (gpuserver) and a Daemonset (gpuserver-ds):

gpuserver

Provide following apis to help monitor gpu pod and gpu node info:

GET /apis/nvidia-gpu-scheduler/v1/gpupods?watch=true
GET /apis/nvidia-gpu-scheduler/v1/gpunodes?watch=true

Provide following apis to help extend kubernetes kube-scheduler as a HTTPExtender:

POST /apis/nvidia-gpu-scheduler/v1/schedule/filter
POST /apis/nvidia-gpu-scheduler/v1/schedule/prioritize
POST /apis/nvidia-gpu-scheduler/v1/schedule/preempt

gpuserver-ds

Populate node gpu devices info to gpuserver.

It gets pods used gpu device infos with the help of kubelet grpc Server PodResourcesServer
It gets gpu device infos with the help of NVML.

Please note that: You needn't have to do the following extensions when making sure each of your cluster node have only one type of gpu model. If you have more than one type of gpu device in your kubelet node. In order to make the pod scheduled to the kubelet get gpu with model it needs, the following tow need to be changed additionally.

The original kubernetes kubelet component is not support to shcedule pod with different gpu model, we need to change it.
The original NVIDIA device plugin for Kubernetes need to be changed, to add gpu model info to kubelet via changing the kubelet deviceplugin API.

Prerequisites

The list of prerequisites for running the NVIDIA device scheduler extender described below:

NVIDIA device plugin for Kubernetes.
Kubernetes >= v1.13 (gpuserver-ds get pod gpu info base on kubelet podresources API.)

Quick Start

Build with docker.

$ make all REGISTRY=docker.io/<yourname>

Add an extender configuration to kubernetes kube-scheduler config file.

$ cat kube-scheduler-config.yaml
apiVersion: kubescheduler.config.k8s.io/v1alpha2
...
extenders:
 - urlPrefix: 'https://<kube-apiserver>:6443/apis/nvidia-gpu-scheduler/v1/schedule'
   filterVerb: filter
   prioritizeVerb: prioritize
   preemptVerb: preempt
   weight: 1
   enableHttps: true
   nodeCacheCapable: true
   ignorable: true
   TLSConfig:
     CAFile: /etc/kubernetes/ssl/ca.pem
     CertFile: /etc/kubernetes/ssl/admin.pem
     KeyFile: /etc/kubernetes/ssl/admin-key.pem
profiles:
- schedulerName: default-scheduler

Deploy with helm

Current version of nvidia-gpu-scheduler is v0.2.0. The preferred way to deploy it is using helm.

Instructions for installing helm can be found here. The simple guide for helm with nvidia-gpu-scheduler repo can be found here

Add and Update chart repo

# helm repo add ngs https://caden2016.github.io/nvidia-gpu-scheduler
# helm repo update

Install from chart repo，xxx is the release name. nodeinfo=gpu is the label of gpu node, where to deploy gpuserver-ds.

# helm install xxx ngs/nvidia-gpu-scheduler --version 0.2.0 --namespace kube-system  --set nodeSelectorDaemonSet.nodeinfo=gpu
# helm  list --namespace kube-system

Building and Running Locally

Versioning

Use the versioning to follow SEMVER. The first version following this scheme has been tagged v0.0.0.

Going forward, the major version of the nvidia-gpu-scheduler will only change following a change in the kubelet podresources API itself. For example, version v1alpha1 of kubelet podresources API corresponds to version v0.x.x of nvidia-gpu-scheduler. If a new v2beta2 version of kubelet podresources API comes out, then nvidia-gpu-scheduler will increase its major version to 1.x.x.

As of now, the podresources API for Kubernetes >= v1.13 is v1alpha1 or v1 added compatibly. If you have a version of Kubernetes >= 1.13 you can deploy any nvidia-gpu-scheduler version > v0.0.0.

# Packages

# README

NVIDIA device scheduler extender for Kubernetes

Table of Contents

Introduction

Features and Components

Features

Components

gpuserver

gpuserver-ds

Prerequisites

Quick Start

Build with docker.

Add an extender configuration to kubernetes kube-scheduler config file.

Deploy with `helm`

Building and Running Locally

Versioning

# Packages

# README

NVIDIA device scheduler extender for Kubernetes

Table of Contents

Introduction

Features and Components

Features

Components

gpuserver

gpuserver-ds

Prerequisites

Quick Start

Build with docker.

Add an extender configuration to kubernetes kube-scheduler config file.

Deploy with helm

Building and Running Locally

Versioning

Deploy with `helm`