Memtier Policy

Overview

The memtier policy extends the topology-aware policy. It supports the same features and configuration options, such as topology hints and annotations, which the topology-aware policy does. Please see the documentation for topology-aware policy for the description of how topology-awarepolicy works and how it is configured.

The main goal of memtier policy is to let workloads choose the kinds of memory it wants to use. The topology-aware policy scoring algorithm for selecting topology nodes is changed so that a workload can belong to both a CPU node and a memory node in the topology tree – the CPU allocation is reserved from the CPU node and the memory controllers are selected from the memory node. Typically the aim is that the CPU and memory allocations are done from the same node so that the memory locality is as good as possible, but the memory allocation may happen also from a wider pool of memory controllers if the amount of free memory on a topology node is too low.

Activation of the Memtier Policy

You can activate the memtier policy by setting --policy parameter of cri-resmgr to memtier. For example:

cri-resmgr --policy memtier --reserved-resources cpu=750m

Configuration

The memtier policy knows of three kinds of memory: DRAM, PMEM, and HBM. The various memory types are accessed via separate memory controllers.

  • DRAM (dynamic random-access memory) is regular system main memory.

  • PMEM (persistent memory) is large-capacity memory, such as Intel® Optane™ memory.

  • HBM (high-bandwidth memory) is high speed memory, typically found on some special-purpose computing systems.

In order to configure a pod to use a certain memory type, use cri-resource-manager.intel.com/memory-type annotation in the pod spec. For example, to make a container request both PMEM and DRAM memory types, you could use pod metadata such as this:

metadata:
  annotations:
    cri-resource-manager.intel.com/memory-type: |
      container1: dram,pmem

The memtier policy will then aim to allocate resources from a topology node which can satisfy the memory requirements.

Cold Start

The memtier policy supports “cold start” functionality. When cold start is enabled and the workload is allocated to a topology node with both DRAM and PMEM memory, the initial memory controller is only the PMEM controller. DRAM controller is added to the workload only after the cold start timeout is done. The effect of this is that allocated large unused memory areas of memory don’t need to be migrated to PMEM, because it was allocated there to begin with. Cold start is configured like this in the pod metadata:

metadata:
  annotations:
    cri-resource-manager.intel.com/memory-type: |
      container1: dram,pmem
    cri-resource-manager.intel.com/cold-start: |
      container1:
        duration: 60s

In the above example, container1 would be initially granted only PMEM memory controller, but after 60 seconds the DRAM controller would be added to the container memset.

Dynamic Page Demotion

The memtier policy also supports dynamic page demotion. The idea is to move rarely-used pages from DRAM to PMEM for those workloads for which both DRAM and PMEM memory types have been assigned. The configuration for this feature is done on the memtier policy configuration using three configuration keys: DirtyBitScanPeriod, PageMovePeriod, and PageMoveCount. All of the three parameters need to be set to non-zero values in order for the dynamic page demotion feature to be enabled. See this configuration file fragment as an example:

policy:
  Active: memtier
  memtier:
    DirtyBitScanPeriod: 10s
    PageMovePeriod: 2s
    PageMoveCount: 1000

In this setup, every pid in every container in every non-system pod fulfilling the memory container requirements would have their page ranges scanned for non-accessed pages every ten seconds. The result of the scan would be fed to a page-moving loop, which would attempt to move 1000 pages every two seconds from DRAM to PMEM.

Container memory requests and limits

Due to inaccuracies in how cri-resmgr calculates memory requests for pods in QoS class Burstable, you should either use Limit for setting the amount of memory for containers in Burstable pods or run the resource-annotating webhook as described in the top-level README file.

Implicit Hardware Topology Hints

CRI Resource Manager automatically generates HW Topology Hints for containers before resource allocation by a policy. The memtier policy is hint-aware and takes these hints into account. Since hints indicate optimal or preferred HW locality for devices and potentially local volumes used by the container, they can alter significantly how resources are assigned to the container.

Using the ‘topologyhints’ resource manager annotation key it is possible to opt out from automatic topology hint generation on a per pod or container basis.

Use this annotation to opt out a full pod:

  annotations:
    topologyhints.cri-resource-manager.intel.com/pod: "false"

Use this annotation to opt out container ‘foo’ in the pod:

  annotations:
    topologyhints.cri-resource-manager.intel.com/container.foo: "false"

Currently topology hint generation is enabled by default, so using the annotation as opt in (setting it to “true”) should have no effect on the placement of containers of a pod. This might change in the future however.