Rate Limiting
Rate Limiting Overview
Rate Limiting is implemented by monitoring the utilization of the device on a per-VF, per-service basis and comparing that to the SLA allocated to that VF and service. This ensures that resources are allocated according to predefined agreements, preventing any single VF from monopolizing device capacity.
Rate Limiting is set up on the host system, allowing administrators to manage and allocate resources effectively. It can be configured for each Physical Function (PF) on the device, providing granular control over resource distribution across different services and virtual functions.
Resources are shared across guests, and the resource utilization of each guest is measured relative to the capacity of the physical function. The feature is supported for SYM, ASYM, and DC services.
This document provides instructions for enabling Rate Limiting for both Out-of-Tree (OOT) and In-tree stacks.
Rate Limiting Reference Algorithms and Capacity
Important
Understanding reference algorithms is critical for proper SLA configuration across both In-Tree and Out-of-Tree implementations.
Device Capacity Reference Algorithms
QAT devices report capacity based on these reference algorithms:
Symmetric Crypto - AES-128-GCM with 4KB packet size
Compression - Dynamic DEFLATE level 1 with 64KB packet size
Asymmetric Crypto - RSA-2048 decrypt operations
Note
Migration Note: Customers upgrading from QAT 1.6/1.7 should note that the symmetric crypto reference changed from AES-128-CBC + SHA256-HMAC at 1KB to AES-128-GCM at 4KB.
SKU-Specific Capacity Values
Device capacity and performance characteristics vary significantly between different SKUs (such as MCC and XCC variants). The same reference algorithm may achieve different throughput levels depending on the specific hardware configuration.
Warning
Always consult your device’s performance brief for SKU-specific capacity values and performance characteristics before configuring SLAs.
Out-of-Tree Rate Limiting
Enabling Rate Limiting With Out-of-Tree Driver
To enable the Rate Limiting feature for the Out-of-Tree stack:
Install the driver package on the host with Single-Root Input/Output Virtualization (SR-IOV) enabled.
Set
ServicesEnabledtoasymorsymordc(or any combination of up to two of these services).Perform
qat_service shutdownandqat_service start.
Important
For Out-of-Tree (OOT) PKE, the total CIR for all SLAs should equal 1000 to ensure proper rate limiting. For symmetric crypto and data compression services, the total CIR should equal the total capacity as returned by the sla_mgr tool.
Service Level Agreement (SLA)
Service Level Agreement enforcement allocates a specified amount of capacity for a specified service to a specified VF: max SLA enforced = (number of VFs) X (number of services) where:
Number of VFs varies based on device type
Number of services = 2 (asymmetric or symmetric or compression)
SLA Units With Out-of-Tree Driver
SLA units are measured as follows:
Symmetric Crypto - 1Mbps of throughput (based on AES-128-GCM @ 4KB reference algorithm).
Asymmetric Crypto - 1 unit is equal to 0.1 percent of available utilization.
Compression - 1Mbps of throughput (based on Dynamic DEFLATE level 1 @ 64KB reference algorithm).
Note
In Gen4 devices, for Asymmetric Crypto services, SLA units are measured in terms of percentage of slice utilization. This metric is more accurate than operations/second or throughput/second because it directly reflects the hardware resources consumed by each user, independent of algorithm processing speed, providing a fair representation of resource usage.
Gen4 devices use a Hardware-assisted Rate limiting approach whereas legacy devices use a firmware-only Rate limiting approach.
For asymmetric service, SLAs shall be allocated at a granularity of 1 unit of device utilization percentage for RSA2K.
Below is a sample mapping table for the 5th Gen Intel® Xeon® Scalable Processer - MCC SKU platform that translates the SLA units to equivalent ops/sec.
Users can run tests with the required algorithm to determine the mapping for other SKUs with different performance. Gen4 asymmetric performance numbers can differ based on the SKU.
Unit |
RSA2K decrypt with CRT (Ops/sec) |
RSA4K decrypt with CRT (Ops/sec) |
|---|---|---|
1 |
60 |
– |
5 |
300 |
– |
10 |
600 |
60 |
50 |
3000 |
300 |
100 |
6000 |
600 |
300 |
18000 |
1800 |
500 |
30000 |
3000 |
750 |
45000 |
4500 |
1000 |
60000 |
6000 |
Migration from QAT 1.x
Customers upgrading from QAT 1.6/1.7 should be aware of reference algorithm changes:
Symmetric Crypto: The reference algorithm changed from AES-128-CBC + SHA256-HMAC at 1KB to AES-128-GCM at 4KB. This may require SLA scaling based on your specific SKU and algorithms.
Algorithm Impact: Performance varies significantly with algorithm choice and packet size compared to the reference.
Decompression Behavior: Decompression operations may achieve approximately 2x the throughput of compression operations for the same SLA setting.
Note
Test your specific algorithms and packet sizes to determine appropriate SLA values for your use case.
SLA Manager Application
The sla_mgr tool is used to create, update, delete, list and get SLA capabilities.
The SLA Manager executable is available in $ICP_ROOT/build/sla_mgr after the package is built and installed using ./configure; make install commands.
SLA Commands
Operation |
Command |
|---|---|
Rate Limiting V1 (Legacy) |
|
Create SLA |
|
Update SLA |
|
Rate Limiting V2 |
|
Create SLA |
|
Update SLA |
|
For Legacy and Rate Limiting V2 |
|
Delete SLA |
|
Delete all SLAs |
|
Query SLA capabilities |
|
Query list of SLAs |
|
Options:
pf_addr- Physical address in domain:bus:device.function(xxxx:xx:xx.x) format.vf_addr- Virtual address in domain:bus:device.function(xxxx:xx:xx.x) format.Service- Asym(=0) or Sym(=1) or DC(=2).rate_in_sla_units- [ 0-MAX]. MAX is found by querying the capabilities.cir/pir- committed/peak information rate [0-MAX]. MAX is found by querying the capabilities.sla_id- Value returned bycreatecommand.
In Legacy mode, to create/update SLA we use rate_in_sla_units. With Rate Limiting V2, we use cir/pir. These units are equal to:
1 operation per second - for asymmetric service (Legacy) or 0.1 percent of available utilization (Rate Limiting V2).
1 Megabits per second - for symmetric service/compression service.
Note
To use Legacy Rate limiting sla_mgr application, user needs to configure with option –enable-legacy-sla-mgr.
Best Practices for SLA Management
Ensure all VFs are included in SLAs to prevent unregulated resource usage.
Regularly monitor performance and adjust CIR and PIR values as needed to maintain optimal throughput.
Out-of-Tree Troubleshooting
Issue: SLA performance doesn’t match expectations
Solutions:
Check if your algorithm matches the reference (AES-128-GCM @ 4KB for symmetric crypto)
For different algorithms, scale SLA based on performance brief ratios
Verify total SLA allocation doesn’t exceed device capacity
Issue: Decompression achieves higher throughput than expected
Solution: This is expected behavior. Decompression can achieve ~2x compression throughput.
Issue: Asymmetric performance varies significantly between workloads
Solution: Gen4 devices use slice utilization measurement which provides more accurate resource allocation than operations-per-second metrics.
Debugging Commands
Check current SLA configuration:
./sla_mgr list <pf_addr>
Query device capabilities:
./sla_mgr caps <pf_addr>
Verify SLA is active:
./sla_mgr get <pf_addr> <sla_id>
In-tree Rate Limiting
Note
For additional details on Rate Limiting with the In-tree solution, refer to sysfs-driver-qat_rl documentation.
Rate Limiting for the in-tree stack is configured per individual Physical Function (PF) using sysfs calls. Each PF has a directory structure that includes several files used to manage SLAs:
Directory Structure
The rate limiting attributes for each PF are located at:
/sys/bus/pci/devices/<BDF>/qat_rl/
The files included in this directory are:
cap_rem: Reports the remaining capability for a particular service/SLA. This is the remaining value that a new SLA can be set to or a current SLA can be increased with.
cir: Committed Information Rate (CIR). The guaranteed rate of throughput that a VF can achieve under its SLA. The value is expressed in permille scale, i.e., 1000 refers to the maximum device throughput for a selected service.
id: Used to retrieve a particular SLA and operate on it. Valid for update, rm, and get operations.
pir: Peak Information Rate (PIR). The maximum rate that can be achieved by that particular SLA. An SLA can reach a value between CIR and PIR when the device is not fully utilized by requests from other users.
rp: Configures the ring pairs associated with an SLA. The value is a 64-bit bit mask and is written/displayed in hex.
sla_op: Used to perform operations on an SLA, such as add, update, rm, rm_all, and get.
srv: Represents the service (sym, asym, dc) associated with an SLA.
Enabling Rate Limiting With In-Tree Driver
To enable the Rate Limiting feature for the In-tree stack:
Identify the device using the Bus-Device-Function (BDF) format, e.g., <BDF>.
Configure the SLA using the sysfs attributes available for qat_4xxx devices.
Ensure the total CIR for all VFs equals 1000 to ensure proper rate limiting.
SLA Units With In-Tree Driver
For the In-tree stack, SLA units are measured as follows:
All services (sym, asym, dc) - Values are expressed in permille scale, where 1000 refers to the maximum device throughput for the selected service.
Note
The percentage-based approach provides consistent units across all services and abstracts away underlying reference algorithm details. Use the cap_rem attribute to check remaining capacity for each service.
Example Setting of SLAs
This example demonstrates setting up SLAs for all VFs for a specified PF, focusing on symmetric and asymmetric crypto services. The RP value is shifted for each VF to allocate resources appropriately.
Remove Existing SLAs: Clear any existing SLAs for the device.
echo "rm_all" > /sys/bus/pci/devices/0000:6b:00.0/qat_rl/sla_op
Set SLAs for Symmetric Service: For each VF, set the SLA parameters and add the SLA:
for vf in {0..15}; do cir_pir_value=62 rp_value=$(printf "0x%x" $((0xa << (vf * 4)))) echo $cir_pir_value > /sys/bus/pci/devices/0000:6b:00.0/qat_rl/cir echo $cir_pir_value > /sys/bus/pci/devices/0000:6b:00.0/qat_rl/pir echo "sym" > /sys/bus/pci/devices/0000:6b:00.0/qat_rl/srv echo $rp_value > /sys/bus/pci/devices/0000:6b:00.0/qat_rl/rp echo "add" > /sys/bus/pci/devices/0000:6b:00.0/qat_rl/sla_op echo "SLA added for BDF: 0000:6b:00.0, VF: $vf, Service: sym, RP: $rp_value, CIR/PIR: $cir_pir_value" done
Set SLAs for Asymmetric Service: Similarly, set the SLA parameters for the asymmetric service:
for vf in {0..15}; do cir_pir_value=62 rp_value=$(printf "0x%x" $((0x5 << (vf * 4)))) echo $cir_pir_value > /sys/bus/pci/devices/0000:6b:00.0/qat_rl/cir echo $cir_pir_value > /sys/bus/pci/devices/0000:6b:00.0/qat_rl/pir echo "asym" > /sys/bus/pci/devices/0000:6b:00.0/qat_rl/srv echo $rp_value > /sys/bus/pci/devices/0000:6b:00.0/qat_rl/rp echo "add" > /sys/bus/pci/devices/0000:6b:00.0/qat_rl/sla_op echo "SLA added for BDF: 0000:6b:00.0, VF: $vf, Service: asym, RP: $rp_value, CIR/PIR: $cir_pir_value" done
This example illustrates setting up rate limiting for one QAT endpoint, evenly distributing the PF capacity among 16 VFs for both symmetric and asymmetric services.
Performance Validation
Validating SLA Performance
To ensure your SLA configuration meets performance requirements:
Baseline Testing: Test your application without rate limiting enabled to establish maximum performance.
SLA Testing: Configure SLAs and test with rate limiting enabled.
Performance Comparison: Compare results and adjust SLA values if needed.
Expected Performance Ranges
Symmetric Crypto: 90-100% of calculated SLA performance
Compression: 90-100% of calculated SLA performance
Decompression: May achieve higher throughput than compression (expected behavior)
Asymmetric: Performance varies by algorithm complexity
Note
Performance measurements should be taken over at least 1-second intervals for accurate rate limiting assessment.