Rate Limiting

Rate Limiting Overview

Rate Limiting is implemented by monitoring the utilization of the device on a per-VF, per-service basis and comparing that to the SLA allocated to that VF and service. This ensures that resources are allocated according to predefined agreements, preventing any single VF from monopolizing device capacity.

Rate Limiting is set up on the host system, allowing administrators to manage and allocate resources effectively. It can be configured for each Physical Function (PF) on the device, providing granular control over resource distribution across different services and virtual functions.

Resources are shared across guests, and the resource utilization of each guest is measured relative to the capacity of the physical function. The feature is supported for SYM, ASYM, and DC services.

This document provides instructions for enabling Rate Limiting for both Out-of-Tree (OOT) and In-tree stacks.

Rate Limiting Reference Algorithms and Capacity

Important

Understanding reference algorithms is critical for proper SLA configuration across both In-Tree and Out-of-Tree implementations.

Device Capacity Reference Algorithms

QAT devices report capacity based on these reference algorithms:

  • Symmetric Crypto - AES-128-GCM with 4KB packet size

  • Compression - Dynamic DEFLATE level 1 with 64KB packet size

  • Asymmetric Crypto - RSA-2048 decrypt operations

Note

Migration Note: Customers upgrading from QAT 1.6/1.7 should note that the symmetric crypto reference changed from AES-128-CBC + SHA256-HMAC at 1KB to AES-128-GCM at 4KB.

SKU-Specific Capacity Values

Device capacity and performance characteristics vary significantly between different SKUs (such as MCC and XCC variants). The same reference algorithm may achieve different throughput levels depending on the specific hardware configuration.

Warning

Always consult your device’s performance brief for SKU-specific capacity values and performance characteristics before configuring SLAs.

Out-of-Tree Rate Limiting

Enabling Rate Limiting With Out-of-Tree Driver

To enable the Rate Limiting feature for the Out-of-Tree stack:

  1. Install the driver package on the host with Single-Root Input/Output Virtualization (SR-IOV) enabled.

  2. Set ServicesEnabled to asym or sym or dc (or any combination of up to two of these services).

  3. Perform qat_service shutdown and qat_service start.

Important

For Out-of-Tree (OOT) PKE, the total CIR for all SLAs should equal 1000 to ensure proper rate limiting. For symmetric crypto and data compression services, the total CIR should equal the total capacity as returned by the sla_mgr tool.

Service Level Agreement (SLA)

Service Level Agreement enforcement allocates a specified amount of capacity for a specified service to a specified VF: max SLA enforced = (number of VFs) X (number of services) where:

  • Number of VFs varies based on device type

  • Number of services = 2 (asymmetric or symmetric or compression)

SLA Units With Out-of-Tree Driver

SLA units are measured as follows:

  • Symmetric Crypto - 1Mbps of throughput (based on AES-128-GCM @ 4KB reference algorithm).

  • Asymmetric Crypto - 1 unit is equal to 0.1 percent of available utilization.

  • Compression - 1Mbps of throughput (based on Dynamic DEFLATE level 1 @ 64KB reference algorithm).

Note

  1. In Gen4 devices, for Asymmetric Crypto services, SLA units are measured in terms of percentage of slice utilization. This metric is more accurate than operations/second or throughput/second because it directly reflects the hardware resources consumed by each user, independent of algorithm processing speed, providing a fair representation of resource usage.

  2. Gen4 devices use a Hardware-assisted Rate limiting approach whereas legacy devices use a firmware-only Rate limiting approach.

  3. For asymmetric service, SLAs shall be allocated at a granularity of 1 unit of device utilization percentage for RSA2K.

  4. Below is a sample mapping table for the 5th Gen Intel® Xeon® Scalable Processer - MCC SKU platform that translates the SLA units to equivalent ops/sec.

  5. Users can run tests with the required algorithm to determine the mapping for other SKUs with different performance. Gen4 asymmetric performance numbers can differ based on the SKU.

Sample Mapping Table for 5th Gen Intel® Xeon® Scalable Processer - MCC SKU

Unit

RSA2K decrypt with CRT (Ops/sec)

RSA4K decrypt with CRT (Ops/sec)

1

60

5

300

10

600

60

50

3000

300

100

6000

600

300

18000

1800

500

30000

3000

750

45000

4500

1000

60000

6000

Migration from QAT 1.x

Customers upgrading from QAT 1.6/1.7 should be aware of reference algorithm changes:

  • Symmetric Crypto: The reference algorithm changed from AES-128-CBC + SHA256-HMAC at 1KB to AES-128-GCM at 4KB. This may require SLA scaling based on your specific SKU and algorithms.

  • Algorithm Impact: Performance varies significantly with algorithm choice and packet size compared to the reference.

  • Decompression Behavior: Decompression operations may achieve approximately 2x the throughput of compression operations for the same SLA setting.

Note

Test your specific algorithms and packet sizes to determine appropriate SLA values for your use case.

SLA Manager Application

The sla_mgr tool is used to create, update, delete, list and get SLA capabilities. The SLA Manager executable is available in $ICP_ROOT/build/sla_mgr after the package is built and installed using ./configure; make install commands.

SLA Commands

Rate Limiting SLA Commands

Operation

Command

Rate Limiting V1 (Legacy)

Create SLA

./sla_mgr create <vf_addr> <rate_in_sla_units> <service>

Update SLA

./sla_mgr update <pf_addr> <sla_id> <rate_in_sla_units>

Rate Limiting V2

Create SLA

./sla_mgr create <vf_addr> <cir> <pir> <service>

Update SLA

./sla_mgr update <pf_addr> <sla_id> <cir> <pir>

For Legacy and Rate Limiting V2

Delete SLA

./sla_mgr delete <pf_addr> <sla_id>

Delete all SLAs

./sla_mgr delete_all <pf_addr>

Query SLA capabilities

./sla_mgr caps <pf_addr>

Query list of SLAs

./sla_mgr list <pf_addr>

Options:

  • pf_addr - Physical address in domain:bus:device.function(xxxx:xx:xx.x) format.

  • vf_addr - Virtual address in domain:bus:device.function(xxxx:xx:xx.x) format.

  • Service - Asym(=0) or Sym(=1) or DC(=2).

  • rate_in_sla_units - [ 0-MAX]. MAX is found by querying the capabilities.

  • cir/pir - committed/peak information rate [0-MAX]. MAX is found by querying the capabilities.

  • sla_id - Value returned by create command.

In Legacy mode, to create/update SLA we use rate_in_sla_units. With Rate Limiting V2, we use cir/pir. These units are equal to:

  • 1 operation per second - for asymmetric service (Legacy) or 0.1 percent of available utilization (Rate Limiting V2).

  • 1 Megabits per second - for symmetric service/compression service.

Note

To use Legacy Rate limiting sla_mgr application, user needs to configure with option –enable-legacy-sla-mgr.

Best Practices for SLA Management

  • Ensure all VFs are included in SLAs to prevent unregulated resource usage.

  • Regularly monitor performance and adjust CIR and PIR values as needed to maintain optimal throughput.

Out-of-Tree Troubleshooting

Issue: SLA performance doesn’t match expectations

Solutions:

  1. Check if your algorithm matches the reference (AES-128-GCM @ 4KB for symmetric crypto)

  2. For different algorithms, scale SLA based on performance brief ratios

  3. Verify total SLA allocation doesn’t exceed device capacity

Issue: Decompression achieves higher throughput than expected

Solution: This is expected behavior. Decompression can achieve ~2x compression throughput.

Issue: Asymmetric performance varies significantly between workloads

Solution: Gen4 devices use slice utilization measurement which provides more accurate resource allocation than operations-per-second metrics.

Debugging Commands

Check current SLA configuration:

./sla_mgr list <pf_addr>

Query device capabilities:

./sla_mgr caps <pf_addr>

Verify SLA is active:

./sla_mgr get <pf_addr> <sla_id>

In-tree Rate Limiting

Note

For additional details on Rate Limiting with the In-tree solution, refer to sysfs-driver-qat_rl documentation.

Rate Limiting for the in-tree stack is configured per individual Physical Function (PF) using sysfs calls. Each PF has a directory structure that includes several files used to manage SLAs:

Directory Structure

The rate limiting attributes for each PF are located at:

/sys/bus/pci/devices/<BDF>/qat_rl/

The files included in this directory are:

  • cap_rem: Reports the remaining capability for a particular service/SLA. This is the remaining value that a new SLA can be set to or a current SLA can be increased with.

  • cir: Committed Information Rate (CIR). The guaranteed rate of throughput that a VF can achieve under its SLA. The value is expressed in permille scale, i.e., 1000 refers to the maximum device throughput for a selected service.

  • id: Used to retrieve a particular SLA and operate on it. Valid for update, rm, and get operations.

  • pir: Peak Information Rate (PIR). The maximum rate that can be achieved by that particular SLA. An SLA can reach a value between CIR and PIR when the device is not fully utilized by requests from other users.

  • rp: Configures the ring pairs associated with an SLA. The value is a 64-bit bit mask and is written/displayed in hex.

  • sla_op: Used to perform operations on an SLA, such as add, update, rm, rm_all, and get.

  • srv: Represents the service (sym, asym, dc) associated with an SLA.

Enabling Rate Limiting With In-Tree Driver

To enable the Rate Limiting feature for the In-tree stack:

  1. Identify the device using the Bus-Device-Function (BDF) format, e.g., <BDF>.

  2. Configure the SLA using the sysfs attributes available for qat_4xxx devices.

  3. Ensure the total CIR for all VFs equals 1000 to ensure proper rate limiting.

SLA Units With In-Tree Driver

For the In-tree stack, SLA units are measured as follows:

  • All services (sym, asym, dc) - Values are expressed in permille scale, where 1000 refers to the maximum device throughput for the selected service.

Note

The percentage-based approach provides consistent units across all services and abstracts away underlying reference algorithm details. Use the cap_rem attribute to check remaining capacity for each service.

Example Setting of SLAs

This example demonstrates setting up SLAs for all VFs for a specified PF, focusing on symmetric and asymmetric crypto services. The RP value is shifted for each VF to allocate resources appropriately.

  1. Remove Existing SLAs: Clear any existing SLAs for the device.

    echo "rm_all" > /sys/bus/pci/devices/0000:6b:00.0/qat_rl/sla_op
    
  2. Set SLAs for Symmetric Service: For each VF, set the SLA parameters and add the SLA:

    for vf in {0..15}; do
        cir_pir_value=62
        rp_value=$(printf "0x%x" $((0xa << (vf * 4))))
        echo $cir_pir_value > /sys/bus/pci/devices/0000:6b:00.0/qat_rl/cir
        echo $cir_pir_value > /sys/bus/pci/devices/0000:6b:00.0/qat_rl/pir
        echo "sym" > /sys/bus/pci/devices/0000:6b:00.0/qat_rl/srv
        echo $rp_value > /sys/bus/pci/devices/0000:6b:00.0/qat_rl/rp
        echo "add" > /sys/bus/pci/devices/0000:6b:00.0/qat_rl/sla_op
        echo "SLA added for BDF: 0000:6b:00.0, VF: $vf, Service: sym, RP: $rp_value, CIR/PIR: $cir_pir_value"
    done
    
  3. Set SLAs for Asymmetric Service: Similarly, set the SLA parameters for the asymmetric service:

    for vf in {0..15}; do
        cir_pir_value=62
        rp_value=$(printf "0x%x" $((0x5 << (vf * 4))))
        echo $cir_pir_value > /sys/bus/pci/devices/0000:6b:00.0/qat_rl/cir
        echo $cir_pir_value > /sys/bus/pci/devices/0000:6b:00.0/qat_rl/pir
        echo "asym" > /sys/bus/pci/devices/0000:6b:00.0/qat_rl/srv
        echo $rp_value > /sys/bus/pci/devices/0000:6b:00.0/qat_rl/rp
        echo "add" > /sys/bus/pci/devices/0000:6b:00.0/qat_rl/sla_op
        echo "SLA added for BDF: 0000:6b:00.0, VF: $vf, Service: asym, RP: $rp_value, CIR/PIR: $cir_pir_value"
    done
    

This example illustrates setting up rate limiting for one QAT endpoint, evenly distributing the PF capacity among 16 VFs for both symmetric and asymmetric services.

Performance Validation

Validating SLA Performance

To ensure your SLA configuration meets performance requirements:

  1. Baseline Testing: Test your application without rate limiting enabled to establish maximum performance.

  2. SLA Testing: Configure SLAs and test with rate limiting enabled.

  3. Performance Comparison: Compare results and adjust SLA values if needed.

Expected Performance Ranges

  • Symmetric Crypto: 90-100% of calculated SLA performance

  • Compression: 90-100% of calculated SLA performance

  • Decompression: May achieve higher throughput than compression (expected behavior)

  • Asymmetric: Performance varies by algorithm complexity

Note

Performance measurements should be taken over at least 1-second intervals for accurate rate limiting assessment.