# Topology-Aware Policy ## Overview The `topology-aware` builtin policy splits up the node into a tree of pools from which then resources are allocated to Containers. Currently the tree of pools is constructed automatically using runtime-discovered hardware topology information about the node. The pools correspond to the topologically relevant HW components: sockets, NUMA nodes, and CPUs/cores. The root of the tree corresponds to the full HW available in the system, the next level corresponds to individual sockets in the system, the next one to individual NUMA nodes. The main goal of the `topology-aware` policy is to try and distribute Containers among the pools (tree nodes) in a way that both maximizes Container performance and minimizes interference between the Containers of different `Pod`s. This is accomplished by considering - topological characteristics of the Container's devices (`topology hints`) - potential hints provided by the user (in the form of policy-specific `annotations`) - current availability of hardware resources - other colocated Containers running on the node ## Features - aligning workload CPU and memory wrt. the locality of devices used - exclusive CPU allocation from pools - discovering and using kernel-isolated CPUs for exclusive allocations - shared CPU allocation from pools - mixed (both exclusive and shared) allocation from pools - exposing the allocated CPU to Containers - notifying Containers about changes in allocation ## Activating the Topology-Aware Policy You can activate the tpology-aware policy by setting the `--policy` option of `cri-resmgr` to `topology-aware`. For instance like this: ``` cri-resmgr --policy topology-aware --reserved-resources cpu=750m ``` ## Configuration ### Commandline Options There are a number of options specific to this policy: - `--topology-aware-pin-cpu`: Whether to pin Containers to the CPUs of the assigned pool. - `--topology-aware-pin-memory`: Whether to pin Containers to the memory of the assigned pool. - `--topology-aware-prefer-isolated-cpus: Whether to try to allocate kernel-isolated CPUs for exclusive usage unless the Pod or Container is explicitly annotated otherwise. - `--topology-aware-prefer-shared-cpus`: Whether to allocate shared CPUs unless the Pod or Container is explicitly annotated otherwise. ### Dynamic Configuration The `topology-aware` policy can be configured dynamically using the [`node agent`](../node-agent.md). It takes a JSON configuration with the following keys corresponding to the above mentioned options: - `PinCPU` - `PinMemory` - `PreferIsolatedCPUs` - `PreferSharedCPUs` See the [`documentation`](/README.md#dynamic-configuration) for information about dynamic configuration. See the [sample ConfigMap spec](/sample-configs/cri-resmgr-configmap.example.yaml) for an example which configures the `topology-aware` policy with the built-in defaults. ### Container / `Pod` Allocation Policy Hints The `topology-aware` policy recognizes a number of policy-specific annotations that can be used to provide hints and preferences about how resources should be allocated to the Containers. These hints are: - `cri-resource-manager.intel.com/prefer-isolated-cpus`: isolated exclusive CPU preference - `cri-resource-manager.intel.com/prefer-shared-cpus`: shared allocation preference #### Isolated Exclusive CPUs When kernel-isolated CPUs are available ,the `topology-aware` policy will prefer to allocate those to any Container of a `Pod` in the `Guaranteed QoS class` if the Container `resource requirements` ask for exactly 1 CPU. If multiple CPUs are requested, exlusive CPUs will be sliced off from the shared CPU set of the pool. This default behavior can be changed using the `--topology-aware-prefer-isolated-cpus` boolean configuration option. The global default behavior can also be overridden, per Pod or per Container, using the `cri-resource-manager.intel.com/prefer-isolated-cpus` `annotation`. Setting the value to `true` asks the policy to prefer isoalted CPUs for exclusive allocation even if the Container asks for multiple CPUs and only fall back to slicing off shared CPUs then there is insufficent free isolated capacity. Similarly, setting the value of the `annotation` to `false` opts out every Container in the `Pod` from taking any isolated CPUs. The same mechanism can be used to opt-in or out of isolated CPU usage per Container within the `Pod` by setting the value of the `annotation` to the string represenation of a JSON object where each key is the name of a Container and each value is either `true` or `false`. #### Shared CPU Allocation The `topology-aware` policy assumes mixed mode exclusive+shared CPU allocation preference by default. Under those assumptions every Container of a `Pod` in the ´Guaranteed QoS class` will get exclusive CPUs allocated worth the integer part of their `CPU request` and a portion of the pool shared CPU set proportional to the fractional part of their `CPU request`. So for instance, a Container requesting 2.5 CPUs or 2500 milli-CPUs will get by default two exclusive CPUs allocated and half a CPU worth allocated from the pools CPU set shared with other Container in the same pool. This default behavior can be changed using the `--topology-aware-prefer-shared-cpus` boolean configuration option. Pods or Containers can opt-out of this assumption using the `cri-resource-manager.intel.com/prefer-shared-cpus` `annotation`. Setting its value to `true` will cause the policy to always allocate the entire requested capacity for all Containers of the Pod from the shared CPUs of a pool. Setting the value to `false` will cause the policy to allocate any integer portion of the CPU request exclusively and any fractional part from the shared CPUs. The same thing can be accomplished per Container by using as value a `JSON object` similarly to the isolated CPU preference `annotation`: using the Container name as a key, and `true` or `false` as the value. Moreover, if a negative integer is used as the value, it is interpreted as `true` with a Container displacement upward in the tree. For instance, setting the annotation value to ``` "{\"container-1\": -1, \"container-2\": true}" (or `0` instead of `true`) ``` requests container-1 to be placed to the parent of the pool with the best fitting score and container-2 to be placed in the best fitting pool itself. #### Intra-Pod Container Affinity/Anti-affinity `Containers` within a `Pod` can be annotated with `affinity` or `anti-affinity` rules, using the `cri-resource-manager.intel.com/affinity` and `cri-resource-manager.intel.com/anti-affinity` annotations. `Affinity` indicates a `soft pull` preference while `anti-affinity` indicates a `soft push` preference. The `topology-aware` policy will try to colocate `containers` with `affinity` to the same pool and `Containers` with `anti-affinity` to different pools. Here is an example snippet of a `Pod Spec` with - `container3` having `affinity` to `container1` and `anti-affinity` to `container2`, - `container4` having `anti-affinity` to `container2`, and `container3` ``` annotations: cri-resource-manager.intel.com/affinity: | container3: [ container1 ] cri-resource-manager.intel.com/anti-affinity: | container3: [ container2 ] container4: [ container2, container3 ] ``` This is actually a shorthand notation for the following, as `key` defaults to `io.kubernetes.container.name`, and `operator` defaults to `In`. ``` metadata: annotations: cri-resource-manager.intel.com/affinity: |+ container3: - match: key: io.kubernetes.container.name operator: In values: - container1 cri-resource-manager.intel.com/anti-affinity: |+ container3: - match: key: io.kubernetes.container.name operator: In values: - container2 container4: - match: key: io.kubernetes.container.name operator: In values: - container2 - container3 ``` Affinity and anti-affinity can have weights assigned as well. If omitted affinity weights default to `1` and anti-affinity weights to `-1`. The above example is actually represented internally with something equivalent to the following. ``` metadata: annotations: cri-resource-manager.intel.com/affinity: |+ container3: - match: key: io.kubernetes.container.name operator: In values: - container1 weight: 1 - match: key: io.kubernetes.container.name operator: In values: - container2 weight: -1 container4: - match: key: io.kubernetes.container.name operator: In values: - container2 - container3 weight: -1 ``` For a more detailed description see [the documentation of annotations](container-affinity.md).