Topology-Aware Policy¶
Overview¶
The topology-aware
builtin policy splits up the node into a tree of pools from
which then resources are allocated to Containers. Currently the tree of pools is
constructed automatically using runtime-discovered hardware topology information
about the node. The pools correspond to the topologically relevant HW components:
sockets, NUMA nodes, and CPUs/cores. The root of the tree corresponds to the full
HW available in the system, the next level corresponds to individual sockets in the
system, the next one to individual NUMA nodes.
The main goal of the topology-aware
policy is to try and distribute Containers
among the pools (tree nodes) in a way that both maximizes Container performance
and minimizes interference between the Containers of different Pod
s. This is
accomplished by considering
topological characteristics of the Container’s devices (
topology hints
)potential hints provided by the user (in the form of policy-specific
annotations
)current availability of hardware resources
other colocated Containers running on the node
Features¶
aligning workload CPU and memory wrt. the locality of devices used
exclusive CPU allocation from pools
discovering and using kernel-isolated CPUs for exclusive allocations
shared CPU allocation from pools
mixed (both exclusive and shared) allocation from pools
exposing the allocated CPU to Containers
notifying Containers about changes in allocation
Activating the Topology-Aware Policy¶
You can activate the tpology-aware policy by setting the --policy
option of
cri-resmgr
to topology-aware
. For instance like this:
cri-resmgr --policy topology-aware --reserved-resources cpu=750m
Configuration¶
Commandline Options¶
There are a number of options specific to this policy:
--topology-aware-pin-cpu
: Whether to pin Containers to the CPUs of the assigned pool.--topology-aware-pin-memory
: Whether to pin Containers to the memory of the assigned pool.`–topology-aware-prefer-isolated-cpus: Whether to try to allocate kernel-isolated CPUs for exclusive usage unless the Pod or Container is explicitly annotated otherwise.
--topology-aware-prefer-shared-cpus
: Whether to allocate shared CPUs unless the Pod or Container is explicitly annotated otherwise.
Dynamic Configuration¶
The topology-aware
policy can be configured dynamically using the
node agent. It takes
a JSON configuration with the following keys corresponding to the above
mentioned options:
PinCPU
PinMemory
PreferIsolatedCPUs
PreferSharedCPUs
See the documentation
for information about
dynamic configuration.
See the
sample ConfigMap spec
for an example which configures the topology-aware
policy with the built-in
defaults.
Container / Pod
Allocation Policy Hints¶
The topology-aware
policy recognizes a number of policy-specific annotations
that can be used to provide hints and preferences about how resources should
be allocated to the Containers. These hints are:
cri-resource-manager.intel.com/prefer-isolated-cpus
: isolated exclusive CPU preferencecri-resource-manager.intel.com/prefer-shared-cpus
: shared allocation preference
Isolated Exclusive CPUs¶
When kernel-isolated CPUs are available ,the topology-aware
policy will prefer
to allocate those to any Container of a Pod
in the Guaranteed QoS class
if
the Container resource requirements
ask for exactly 1 CPU. If multiple CPUs are
requested, exlusive CPUs will be sliced off from the shared CPU set of the pool.
This default behavior can be changed using the --topology-aware-prefer-isolated-cpus
boolean configuration option.
The global default behavior can also be overridden, per Pod or per Container, using
the cri-resource-manager.intel.com/prefer-isolated-cpus
annotation
. Setting the
value to true
asks the policy to prefer isoalted CPUs for exclusive allocation even
if the Container asks for multiple CPUs and only fall back to slicing off shared CPUs
then there is insufficent free isolated capacity. Similarly, setting the value of the
annotation
to false
opts out every Container in the Pod
from taking any isolated
CPUs.
The same mechanism can be used to opt-in or out of isolated CPU usage per Container
within the Pod
by setting the value of the annotation
to the string represenation of
a JSON object where each key is the name of a Container and each value is either
true
or false
.
Intra-Pod Container Affinity/Anti-affinity¶
Containers
within a Pod
can be annotated with affinity
or anti-affinity
rules, using the cri-resource-manager.intel.com/affinity
and
cri-resource-manager.intel.com/anti-affinity
annotations.
Affinity
indicates a soft pull
preference while anti-affinity
indicates
a soft push
preference. The topology-aware
policy will try to colocate containers
with affinity
to the same pool and Containers
with anti-affinity
to different
pools.
Here is an example snippet of a Pod Spec
with
container3
havingaffinity
tocontainer1
andanti-affinity
tocontainer2
,container4
havinganti-affinity
tocontainer2
, andcontainer3
annotations:
cri-resource-manager.intel.com/affinity: |
container3: [ container1 ]
cri-resource-manager.intel.com/anti-affinity: |
container3: [ container2 ]
container4: [ container2, container3 ]
This is actually a shorthand notation for the following, as key
defaults to
io.kubernetes.container.name
, and operator
defaults to In
.
metadata:
annotations:
cri-resource-manager.intel.com/affinity: |+
container3:
- match:
key: io.kubernetes.container.name
operator: In
values:
- container1
cri-resource-manager.intel.com/anti-affinity: |+
container3:
- match:
key: io.kubernetes.container.name
operator: In
values:
- container2
container4:
- match:
key: io.kubernetes.container.name
operator: In
values:
- container2
- container3
Affinity and anti-affinity can have weights assigned as well. If omitted affinity weights
default to 1
and anti-affinity weights to -1
. The above example is actually represented
internally with something equivalent to the following.
metadata:
annotations:
cri-resource-manager.intel.com/affinity: |+
container3:
- match:
key: io.kubernetes.container.name
operator: In
values:
- container1
weight: 1
- match:
key: io.kubernetes.container.name
operator: In
values:
- container2
weight: -1
container4:
- match:
key: io.kubernetes.container.name
operator: In
values:
- container2
- container3
weight: -1
For a more detailed description see the documentation of annotations.