# Container Affinity and Anti-Affinity ## Introduction Some policies allow the user to give hints about how particular containers should be *co-located* within a node. In particular these hints express whether containers should be located *'close'* to each other or *'far away'* from each other, in a hardware topology sense. Since these hints are interpreted always by a particular *policy implementation*, the exact definitions of 'close' and 'far' are also somewhat *policy-specific*. However as a general rule of thumb containers running - on CPUs within the *same NUMA nodes* are considered *'close'* to each other, - on CPUs within *different NUMA nodes* in the *same socket* are *'farther'*, and - on CPUs within *different sockets* are *'far'* from each other These hints are expressed by `container affinity annotations` on the Pod. There are two types of affinities: - `affinity` (or `positive affinty`): cause affected containers to *pull* each other closer - `anti-affinity` (or `negative affinity`): cause affected containers to *push* each other further away Policies try to place a container - close to those the container has affinity towards - far from those the container has anti-affinity towards. ## Affinity Annotation Syntax *Affinities* are defined as the `cri-resource-manager.intel.com/affinity` annotation. *Anti-affinities* are defined as the `cri-resource-manager.intel.com/anti-affinity` annotation. They are specified in the `metadata` section of the `Pod YAML`, under `annotations` as a dictionary, with each dictionary key being the name of the *container* within the Pod to which the annotation belongs to. ```yaml metadata: anotations: cri-resource-manager.intel.com/affinity: | container1: - scope: key: key-ref operator: op values: - value1 ... - valueN match: key: key-ref operator: op values: - value1 ... - valueN weight: w ``` An anti-affinity is defined similarly but using `cri-resource-manager.intel.com/anti-affinity` as the annotation key. ```yaml metadata: anotations: cri-resource-manager.intel.com/anti-affinity: | container1: - scope: key: key-ref operator: op values: - value1 ... - valueN match: key: key-ref operator: op values: - value1 ... - valueN weight: w ``` ## Affinity Semantics An affinity consists of three parts: - `scope expression`: defines which containers this affinity is evaluated against - `match expression`: defines for which containers (within the scope) the affinity applies to - `weight`: defines how *strong* a pull or a push the affinity causes *Affinities* are also sometimes referred to as *positive affinities* while *anti-affinities* are referred to as *negative affinities*. The reason for this is that the only difference between these are that affinities have a *positive weight* while anti-affinities have a *negative weight*. The *scope* of an affinity defines the *bounding set of containers* the affinity can apply to. The affinity *expression* is evaluated against the containers *in scope* and it *selects the containers* the affinity really has an effect on. The *weight* specifies whether the effect is a *pull* or a *push*. *Positive* weights cause a *pull* while *negative* weights cause a *push*. Additionally, the *weight* specifies *how strong* the push or the pull is. This is useful in situations where the policy needs to make some compromises because an optimal placement is not possible. The weight then also acts as a way to specify preferences of priorities between the various compromises: the heavier the weight the stronger the pull or push and the larger the propbability that it will be honored, if this is possible at all. The scope can be omitted from an affinity in which case it implies *Pod scope*, in other words the scope of all containers that belong to the same Pod as the container for which which the affinity is defined. The weight can also be omitted in which case it defaults to -1 for anti-affinities and +1 for affinities. Weights are currently limited to the range [-1000,1000]. Both the affinity scope and the expression select containers, therefore they are identical. Both of them are *expressions*. An expression consists of three parts: - key: specifies what *metadata* to pick from a container for evaluation - operation (op): specifies what *logical operation* the expression evaluates - values: a set of *strings* to evaluate the the value of the key against The supported keys are: - for pods: - `name` - `namespace` - `qosclass` - `labels/<label-key>` - `id` - `uid` - for containers: - `pod/<pod-key>` - `name` - `namespace` - `qosclass` - `labels/<label-key>` - `tags/<tag-key>` - `id` Essentially an expression defines a logical operation of the form (key op values). Evaluating this logical expression will take the value of the key in which either evaluates to true or false. a boolean true/false result. Currently the following operations are supported: - `Equals`: equality, true if the *value of key* equals the single item in *values* - `NotEqual`: inequality, true if the *value of key* is not equal to the single item in *values* - `In`: membership, true if *value of key* equals to any among *values* - `NotIn`: negated membership, true if the *value of key* is not equal to any among *values* - `Exists`: true if the given *key* exists with any value - `NotExists`: true if the given *key* does not exist - `AlwaysTrue`: always evaluates to true, can be used to denote node-global scope (all containers) - `Matches`: true if the *value of key* matches the globbing pattern in values - `MatchesNot`: true if the *value of key* does not match the globbing pattern in values - `MatchesAny`: true if the *value of key* matches any of the globbing patterns in values - `MatchesNone`: true if the *value of key* does not match any of the globbing patterns in values The effective affinity between containers C_1 and C_2, A(C_1, C_2) is the sum of the weights of all pairwise in-scope matching affinities W(C_1, C_2). To put it another way, evaluating an affinity for a container C_1 is done by first using the scope (expression) to determine which containers are in the scope of the affinity. Then, for each in-scope container C_2 for which the match expression evaluates to true, taking the weight of the affinity and adding it to the effective affinity A(C_1, C_2). Note that currently (for the topology-aware policy) this evaluation is asymmetric: A(C_1, C_2) and A(C_2, C_1) can and will be different unless the affinity annotations are crafted to prevent this (by making them fully symmetric). Moreover, A(C_1, C_2) is calculated and taken into consideration during resource allocation for C_1, while A(C_2, C_1) is calculated and taken into account during resource allocation for C_2. This might be changed in a future version. Currently affinity expressions lack support for boolean operators (and, or, not). Sometimes this limitation can be overcome by using joint keys, especially with matching operators. The joint key syntax allows joining the value of several keys with a separator into a single value. A joint key can be specified in a simple or full format: - simple: `<colon-separated-subkeys>`, this is equivalent to `:::<colon-separated-subkeys>` - full: `<ksep><vsep><ksep-separated-keylist>` A joint key evaluates to the values of all the `<ksep>`-separated subkeys joined by `<vsep>`. A non-existent subkey evaluates to the empty string. For instance the joint key `:pod/qosclass:pod/name:name` evaluates to `<qosclass>:<pod name>:<container name>` For existence operators, a joint key is considered to exist if any of its subkeys exists. ## Examples Put the container `peter` close to the container `sheep` but far away from the container `wolf`. ```yaml metadata: annotations: cri-resource-manager.intel.com/affinity: | peter: - match: key: name operator: Equals values: - sheep weight: 5 cri-resource-manager.intel.com/anti-affinity: | peter: - match: key: name operator: Equals values: - wolf weight: 5 ``` ## Shorthand Notation There is an alternative shorthand syntax for what is considered to be the most common case: defining affinities between containers within the same pod. With this notation one needs to give just the names of the containers, like in the example below. ```yaml annotations: cri-resource-manager.intel.com/affinity: | container3: [ container1 ] cri-resource-manager.intel.com/anti-affinity: | container3: [ container2 ] container4: [ container2, container3 ] ``` This shorthand notation defines: - `container3` having - affinity (weight 1) to `container1` - `anti-affinity` (weight -1) to `container2` - `container4` having - `anti-affinity` (weight -1) to `container2`, and `container3` The equivalent annotation in full syntax would be ```yaml metadata: annotations: cri-resource-manager.intel.com/affinity: |+ container3: - match: key: labels/io.kubernetes.container.name operator: In values: - container1 cri-resource-manager.intel.com/anti-affinity: |+ container3: - match: key: labels/io.kubernetes.container.name operator: In values: - container2 container4: - match: key: labels/io.kubernetes.container.name operator: In values: - container2 - container3 ```