Container Affinity and Anti-Affinity
Introduction
Some policies allow the user to give hints about how particular containers should be co-located within a node. In particular these hints express whether containers should be located ‘close’ to each other or ‘far away’ from each other, in a hardware topology sense.
Since these hints are interpreted always by a particular policy implementation, the exact definitions of ‘close’ and ‘far’ are also somewhat policy-specific. However as a general rule of thumb containers running
on CPUs within the same NUMA nodes are considered ‘close’ to each other,
on CPUs within different NUMA nodes in the same socket are ‘farther’, and
on CPUs within different sockets are ‘far’ from each other
These hints are expressed by container affinity annotations
on the Pod.
There are two types of affinities:
affinity
(orpositive affinty
): cause affected containers to pull each other closeranti-affinity
(ornegative affinity
): cause affected containers to push each other further away
Policies try to place a container
close to those the container has affinity towards
far from those the container has anti-affinity towards.
Affinity Annotation Syntax
Affinities are defined as the cri-resource-manager.intel.com/affinity
annotation.
Anti-affinities are defined as the cri-resource-manager.intel.com/anti-affinity
annotation. They are specified in the metadata
section of the Pod YAML
, under
annotations
as a dictionary, with each dictionary key being the name of the
container within the Pod to which the annotation belongs to.
metadata:
anotations:
cri-resource-manager.intel.com/affinity: |
container1:
- scope:
key: key-ref
operator: op
values:
- value1
...
- valueN
match:
key: key-ref
operator: op
values:
- value1
...
- valueN
weight: w
An anti-affinity is defined similarly but using cri-resource-manager.intel.com/anti-affinity
as the annotation key.
metadata:
anotations:
cri-resource-manager.intel.com/anti-affinity: |
container1:
- scope:
key: key-ref
operator: op
values:
- value1
...
- valueN
match:
key: key-ref
operator: op
values:
- value1
...
- valueN
weight: w
Affinity Semantics
An affinity consists of three parts:
scope expression
: defines which containers this affinity is evaluated againstmatch expression
: defines for which containers (within the scope) the affinity applies toweight
: defines how strong a pull or a push the affinity causes
Affinities are also sometimes referred to as positive affinities while anti-affinities are referred to as negative affinities. The reason for this is that the only difference between these are that affinities have a positive weight while anti-affinities have a negative weight.
The scope of an affinity defines the bounding set of containers the affinity can apply to. The affinity expression is evaluated against the containers in scope and it selects the containers the affinity really has an effect on. The weight specifies whether the effect is a pull or a push. Positive weights cause a pull while negative weights cause a push. Additionally, the weight specifies how strong the push or the pull is. This is useful in situations where the policy needs to make some compromises because an optimal placement is not possible. The weight then also acts as a way to specify preferences of priorities between the various compromises: the heavier the weight the stronger the pull or push and the larger the propbability that it will be honored, if this is possible at all.
The scope can be omitted from an affinity in which case it implies Pod scope, in other words the scope of all containers that belong to the same Pod as the container for which which the affinity is defined.
The weight can also be omitted in which case it defaults to -1 for anti-affinities and +1 for affinities. Weights are currently limited to the range [-1000,1000].
Both the affinity scope and the expression select containers, therefore they are identical. Both of them are expressions. An expression consists of three parts:
key: specifies what metadata to pick from a container for evaluation
operation (op): specifies what logical operation the expression evaluates
values: a set of strings to evaluate the the value of the key against
The supported keys are:
for pods:
name
namespace
qosclass
labels/<label-key>
id
uid
for containers:
pod/<pod-key>
name
namespace
qosclass
labels/<label-key>
tags/<tag-key>
id
Essentially an expression defines a logical operation of the form (key op values). Evaluating this logical expression will take the value of the key in which either evaluates to true or false. a boolean true/false result. Currently the following operations are supported:
Equals
: equality, true if the value of key equals the single item in valuesNotEqual
: inequality, true if the value of key is not equal to the single item in valuesIn
: membership, true if value of key equals to any among valuesNotIn
: negated membership, true if the value of key is not equal to any among valuesExists
: true if the given key exists with any valueNotExists
: true if the given key does not existAlwaysTrue
: always evaluates to true, can be used to denote node-global scope (all containers)Matches
: true if the value of key matches the globbing pattern in valuesMatchesNot
: true if the value of key does not match the globbing pattern in valuesMatchesAny
: true if the value of key matches any of the globbing patterns in valuesMatchesNone
: true if the value of key does not match any of the globbing patterns in values
The effective affinity between containers C_1 and C_2, A(C_1, C_2) is the sum of the weights of all pairwise in-scope matching affinities W(C_1, C_2). To put it another way, evaluating an affinity for a container C_1 is done by first using the scope (expression) to determine which containers are in the scope of the affinity. Then, for each in-scope container C_2 for which the match expression evaluates to true, taking the weight of the affinity and adding it to the effective affinity A(C_1, C_2).
Note that currently (for the topology-aware policy) this evaluation is asymmetric: A(C_1, C_2) and A(C_2, C_1) can and will be different unless the affinity annotations are crafted to prevent this (by making them fully symmetric). Moreover, A(C_1, C_2) is calculated and taken into consideration during resource allocation for C_1, while A(C_2, C_1) is calculated and taken into account during resource allocation for C_2. This might be changed in a future version.
Currently affinity expressions lack support for boolean operators (and, or, not). Sometimes this limitation can be overcome by using joint keys, especially with matching operators. The joint key syntax allows joining the value of several keys with a separator into a single value. A joint key can be specified in a simple or full format:
simple:
<colon-separated-subkeys>
, this is equivalent to:::<colon-separated-subkeys>
full:
<ksep><vsep><ksep-separated-keylist>
A joint key evaluates to the values of all the <ksep>
-separated subkeys joined by <vsep>
.
A non-existent subkey evaluates to the empty string. For instance the joint key
:pod/qosclass:pod/name:name
evaluates to
<qosclass>:<pod name>:<container name>
For existence operators, a joint key is considered to exist if any of its subkeys exists.
Examples
Put the container peter
close to the container sheep
but far away from the
container wolf
.
metadata:
annotations:
cri-resource-manager.intel.com/affinity: |
peter:
- match:
key: name
operator: Equals
values:
- sheep
weight: 5
cri-resource-manager.intel.com/anti-affinity: |
peter:
- match:
key: name
operator: Equals
values:
- wolf
weight: 5
Shorthand Notation
There is an alternative shorthand syntax for what is considered to be the most common case: defining affinities between containers within the same pod. With this notation one needs to give just the names of the containers, like in the example below.
annotations:
cri-resource-manager.intel.com/affinity: |
container3: [ container1 ]
cri-resource-manager.intel.com/anti-affinity: |
container3: [ container2 ]
container4: [ container2, container3 ]
This shorthand notation defines:
container3
havingaffinity (weight 1) to
container1
anti-affinity
(weight -1) tocontainer2
container4
havinganti-affinity
(weight -1) tocontainer2
, andcontainer3
The equivalent annotation in full syntax would be
metadata:
annotations:
cri-resource-manager.intel.com/affinity: |+
container3:
- match:
key: labels/io.kubernetes.container.name
operator: In
values:
- container1
cri-resource-manager.intel.com/anti-affinity: |+
container3:
- match:
key: labels/io.kubernetes.container.name
operator: In
values:
- container2
container4:
- match:
key: labels/io.kubernetes.container.name
operator: In
values:
- container2
- container3