Containerized Data Importer

A declarative Kubernetes utility to import Virtual Machine images for use with Kubevirt. At a high level, a persistent volume claim (PVC), which defines VM-suitable storage (via a storage class), is created. A custom controller watches for importer specific claims and starts an import/copy process when such a claim is detected. The status of the import process is reflected in the same claim, and when the copy completes Kubevirt creates the VM based on the just-imported image.

Purpose
Design
Running the CDI Controller
Hacking (WIP)
Security Configurations

Purpose

This project is designed with Kubevirt in mind and provides a declarative method for importing VM images into a Kuberenetes cluster. Kubevirt detects when the VM image copy is complete and, using the same PVC that triggered the import process, creates the VM.

This approach supports two main use-cases:

a cluster administrator can build an abstract registry of immutable images (referred to as "Golden Images") which can be cloned and later consumed by Kubevirt, or
an ad-hoc user (granted access) can import a VM image into their own namespace and feed this image directly to Kubevirt, bypassing the cloning step.

For an in depth look at the system and workflow, see the Design documentation.

Data Format

The importer is capable of performing certain functions that streamline its use with Kubevirt. It automatically decompresses gzip and xz files, and un-tar's tar archives. Also, qcow2 images are converted into a raw image files needed by Kubevirt, resulting in the final file being a simple .img file.

Supported file formats are:

.tar
.gz
.xz
.img
.iso
.qcow2

Deploying CDI

Assumptions

A running Kubernetes cluster with roles and role bindings implementing security necesary for the CDI controller to watch PVCs and pods across all namespaces.
A storage class and provisioner.
An HTTP or S3 file server hosting VM images
An optional "golden" namespace acting as the image registry. The default namespace is fine for tire kicking.

Either clone this repo or download the necessary manifests directly:

$ git clone https://github.com/kubevirt/containerized-data-importer.git

$ mkdir cdi-manifests && cd cdi-manifests
$ wget https://raw.githubusercontent.com/kubevirt/containerized-data-importer/kubevirt-centric-readme/manifests/example/golden-pvc.yaml
$ wget https://raw.githubusercontent.com/kubevirt/containerized-data-importer/kubevirt-centric-readme/manifests/example/endpoint-secret.yaml
$ wget https://raw.githubusercontent.com/kubevirt/containerized-data-importer/kubevirt-centric-readme/manifests/controller/controller/cdi-controller-deployment.yaml

Run the CDI Controller

Deploying the CDI controller is straight forward. Choose the namespace where the controller will run and ensure that this namespace has cluster-wide permission to watch all PVCs and pods. In this document the default namespace is used, but in a production setup a namespace that is inaccessible to regular users should be used instead. See Protecting the Golden Image Namespace on creating a secure CDI controller namespace.

$ kubectl -n default create -f https://raw.githubusercontent.com/kubevirt/containerized-data-importer/master/manifests/cdi-controller-deployment.yaml

Start Importing Images

Note: The CDI controller is a required part of this work flow.

Make copies of the example manifests for editing. The neccessary files are:

golden-pvc.yaml
endpoint-secret.yaml

Edit golden-pvc.yaml:

storageClassName: The default StorageClass will be used if not set. Otherwise, set to a desired StorageClass.
kubevirt.io/storage.import.endpoint: The full URL to the VM image in the format of: http://www.myUrl.com/path/of/data or s3://bucketName/fileName.
kubevirt.io/storage.import.secretName: (Optional) The name of the secret containing the authentication credentials required by the file server.

Edit endpoint-secret.yaml (Optional):

Note: Only set these values if the file server requires authentication credentials.

metadata.name: Arbitrary name of the secret. Must match the PVC's kubevirt.io/storage.import.secretName:
accessKeyId: Contains the endpoint's key and/or user name. This value must be base64 encoded with no extraneous linefeeds. Use echo -n "xyzzy" | base64 or printf "xyzzy" | base64 to avoid a trailing linefeed
secretKey: the endpoint's secret or password, again base64 encoded with no extraneous linefeeds.

Deploy the API Objects

(Optional) Create the namespace where the controller will run:

$ kubectl create ns <CDI-NAMESPACE>
Deploy the CDI controller:

$ kubectl -n <CDI-NAMESPACE> create -f manifests/controller/cdi-controller-deployment.yaml

Note: the default verbosity level is set to 1 in the controller deployment file, which is minimal logging. If greater details are desired increase the -v number to 2 or 3.

Note: the importer pod uses the same logging verbosity as the controller. If a different level of logging is required after the controller has been started, the deployment can be edited and applied via kubectl apply -f manifests/controller/cdi-controller-deployment.yaml. This will not alter the running controller's logging level but will affect importer pods created after the change. To change the running controller's log level requires it to be restarted after the deployment has been edited.

(Optional) Create the endpoint secret in the PVC's namespace:

$ kubectl -n <NAMESPACE> create -f endpoint-secret.yaml
Create the persistent volume claim to trigger the import process;

$ kubectl -n <NAMESPACE> create -f golden-pvc.yaml
Monitor the cdi-controller:

$ kubectl -n <CDI-NAMESPACE> logs cdi-deployment-<RANDOM>
Monitor the importer pod:

$ kubectl -n <NAMESPACE> logs importer-<PVC-NAME> # pvc name is shown in controller log

or

kubectl get -n <NAMESPACE> pvc <PVC-NAME> -o yaml | grep "storage.import.pod.phase:" # to see the status of the importer pod triggered by the pvc

Security Configurations

RBAC Roles

CDI runs under a custom ServiceAccount (cdi-sa) and uses the Kubernetes RBAC model to apply an application specific custom ClusterRole with rules to properly access needed resources such as PersistentVolumeClaims and Pods.

NOTE: The cdi-controller-deployment.yaml in the manifests directory should be updated if deploying manually with kubectl to use the correct Namespace where the deployment is running.

Protecting VM Image Namespaces

Currently there is no support for automatically implementing Kubernetes ResourceQuotas and Limits on desired namespaces and resources, therefore administrators need to manually lock down all new namespaces from being able to use the StorageClass associated with CDI/Kubevirt and cloning capabilities. This capability of automatically restricting resources is planned for future releases. Below are some examples of how one might achieve this level of resource protection:

Lock Down StorageClass Usage for Namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: protect-mynamespace
spec:
  hard:
    <STORAGE-CLASS-NAME>.storageclass.storage.k8s.io/requests.storage: "0"

NOTE: .storageclass.storage.k8s.io/persistentvolumeclaims: "0" would also accomplish the same affect by not allowing any pvc requests against the storageclass for this namespace.

Open Up StorageClass Usage for Namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: protect-mynamespace
spec:
  hard:
    <STORAGE-CLASS-NAME>.storageclass.storage.k8s.io/requests.storage: "500Gi"

NOTE: .storageclass.storage.k8s.io/persistentvolumeclaims: "4" could be used and this would only allow for 4 pvc requests in this namespace, anything over that would be denied.

# Packages

# README