Migrating from CRI-RM to NRI
Prerequisities
Up and running CRI Resource Manager
One of the two supported policies in use: balloons or topology-aware.
For other policies a little bit more work is required and the policies need to be ‘ported’. This can be done by just following the example of how balloons or topology-aware were converted.
Steps for an initial/basic migration test
Containerd
Replace the containerd version in the system with 1.7 or newer version (NRI server not supported in older versions).
Replace kubelet’s –container-runtime-endpoint=/var/run/cri-resmgr/cri-resmgr.sock with –container-runtime-endpoint=/var/run/containerd/containerd.sock
Replacing the runtime endpoint on a node that was setup using Kubeadm:
# Get the Kubelet args
systemctl cat kubelet <- Look for: EnvironmentFile=/.../kubeadm-flags.env
vim /.../kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.k8s.io/pause:3.9"
vim /etc/sysconfig/kubelet
KUBELET_EXTRA_ARGS= --container-runtime-endpoint=/var/run/containerd/containerd.sock <- Remember this aswell
systemctl restart kubelet
Edit the containerd config file and look for the section [plugins.”io.containerd.nri.v1.nri”] and replace “disable = true” with “disable = false”:
vim /etc/containerd/config.toml
[plugins."io.containerd.nri.v1.nri"]
disable = false
disable_connections = false
plugin_config_path = "/etc/nri/conf.d"
plugin_path = "/opt/nri/plugins"
plugin_registration_timeout = "5s"
plugin_request_timeout = "2s"
socket_path = "/var/run/nri/nri.sock"
systemctl restart containerd
CRI-O
Ensure that crio version 1.26.2 or newer is used.
Replace kubelet’s –container-runtime-endpoint=/var/run/cri-resmgr/cri-resmgr.sock with –container-runtime-endpoint=/var/run/crio/crio.sock
Replacing the runtime endpoint on a node that was setup using Kubeadm:
# Get the Kubelet args
systemctl cat kubelet <- Look for: EnvironmentFile=/.../kubeadm-flags.env
vim /.../kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--container-runtime-endpoint=unix:///var/run/crio/crio.sock --pod-infra-container-image=registry.k8s.io/pause:3.9"
vim /etc/sysconfig/kubelet
KUBELET_EXTRA_ARGS= --container-runtime-endpoint=/var/run/crio/crio.sock <- Remember this aswell
systemctl restart kubelet
Enable NRI:
CRIO_CONF=/etc/crio/crio.conf
cp $CRIO_CONF $CRIO_CONF.orig
crio --enable-nri config > $CRIO_CONF
systemctl restart crio
Build the NRI policies
git clone https://github.com/containers/nri-plugins.git
cd nri-plugins
make
# Build the images, specify your image repo to easily push the image later.
make images IMAGE_REPO=my-repo IMAGE_VERSION=my-tag
Create required CRDs
kubectl apply -f deployment/base/crds/noderesourcetopology_crd.yaml
Import the image of the NRI plugin you want to run
Containerd
ctr -n k8s.io images import build/images/nri-resmgr-topology-aware-image-*.tar
CRI-O
See the section below for instructions on how to push the images to a registry, then pull from there.
Deploy the plugin
kubectl apply -f build/images/nri-resmgr-topology-aware-deployment.yaml
Deploy a test pod
kubectl run mypod --image busybox -- sleep inf
kubectl exec mypod -- grep allowed_list: /proc/self/status
See the resources the pod got assigned with
kubectl exec $pod -c $container -- grep allowed_list: /proc/self/status
# Output should look similar to the output of CRI-RM
Steps for a more real-life migration using self-hosted image repository
Same steps as above for enabling NRI with Containerd/CRI-O and building the images.
Push the images built to your repository:
# Replace my-repo and my-tag with the IMAGE_REPO and IMAGE_VERSION you specified when building the images with make images docker push my-repo:my-tag
Remember to change the image name & pull policy in the plugins .yaml file to match your registyr and image, ex:
vim build/images/nri-resmgr-topology-aware-deployment.yaml
Then deploy the plugin simlarly to the earlier step.
Migrating existing configuration
The ConfigMap used by the ported policies/infra has a different name/naming scheme than the original one used in CRI-RM, ex:
configMapName:
- configmap-name: cri-resmgr-config + configmap-name: nri-resource-policy-config
The details of grouping nodes by labeling to share configuration:
- cri-resource-manager.intel.com/group: $GROUP_NAME + resource-policy.nri.io/group: $GROUP_NAME
Migrating existing workloads
The annotations one can use to customize how a policy treats a workload use slightly different keys than the original ones in CRI-RM. The collective ‘key namespace’ for policy- and resource-manager-specific annotation has been changed from cri-resource-manager.intel.com to resource-policy.nri.io.
For instance, an explicit type annotation for the balloons policy, which used to be:
... metadata: annotations: balloon.balloons.cri-resource-manager.intel.com/container.$CONTAINER_NAME: $BALLOON_TYPE` ...
Should now be:
... metadata: annotations: balloon.balloons.resource-policy.nri.io/container.$CONTAINER_NAME: $BALLOON_TYPE` ...
Similarly a workload opt-out annotation from exclusive CPU allocation for the topology-aware policy, which used to be:
... metadata: annotations: prefer-shared-cpus.cri-resource-manager.intel.com/container.$CONTAINER_NAME: "true" ...
Should now be:
... metadata: annotations: prefer-shared-cpus.resource-policy.nri.io/container.$CONTAINER_NAME: "true" ...
Similar changes are needed for any cri-resmgr-specific annotation that uses the same semantic scoping for key syntax.
All of the annotations:
Was |
Is now |
---|---|
cri-resource-manager.intel.com/afffinity |
resource-policy.nri.io/affinity |
cri-resource-manager.intel.com/anti-afffinity |
resource-policy.nri.io/anti-affinity |
cri-resource-manager.intel.com/prefer-isolated-cpus |
resource-policy.nri.io/prefer-isolated-cpus |
cri-resource-manager.intel.com/prefer-shared-cpus |
resource-policy.nri.io/prefer-shared-cpus |
cri-resource-manager.intel.com/cold-start |
resource-policy.nri.io/cold-start |
cri-resource-manager.intel.com/memory-type |
resource-policy.nri.io/ memory-type |
prefer-isolated-cpus.cri-resource-manager.intel.com |
prefer-isolated-cpus.resource-policy.nri.io |
prefer-shared-cpus.cri-resource-manager.intel.com |
prefer-shared-cpus.resource-policy.nri.io |
memory-type.cri-resource-manager.intel.com |
memory-type.resource-policy.nri.io |
cold-start.cri-resource-manager.intel.com |
cold-start.resource-policy.nri.io |
prefer-reserved-cpus.cri-resource-manager.intel.com |
prefer-reserved-cpus.resource-policy.nri.io |
rdtclass.cri-resource-manager.intel.com |
rdtclass.resource-policy.nri.io |
blockioclass.cri-resource-manager.intel.com |
blockioclass.resource-policy.nri.io |
toptierlimit.cri-resource-manager.intel.com |
toptierlimit.resource-policy.nri.io |
topologyhints.cri-resource-manager.intel.com |
topologyhints.resource-policy.nri.io |
balloon.balloons.cri-resource-manager.intel.com |
balloon.balloons.resource-policy.nri.io |