Telemetry

The telemetry feature is a tool to view the performance and utilization of an acceleration device. Telemetry data can be viewed on a per device and a per ring pair (also known as queue pair) basis.

Note

There are differences between the implementations of Telemetry with the in-tree acceleration driver and the out-of-tree acceleration driver.

Details for each are included in the sections below.

Telemetry Usage

Out-Of-Tree

The telemetry feature is configured and queried using sysfs files in the Linux filesystem.

The telemetry sysfs folder is located at /sys/devices/pciAAAA:BB/AAAA:BB:CC.D/telemetry

Where:

  • AAAA:BB:CC.D is the Domain:BDF of the target Intel® QAT Endpoint.

Example:

ls /sys/devices/pciAAAA:BB/0000:6b:00.0/telemetry

In-Tree

Important

Refer to Release Notes -In-Tree for kernel version requirements for enabling this feature.

The telemetry feature is configured and queried using debugfs files in the Linux filesystem.

The telemetry debugfs folder is located at /sys/kernel/debug/qat_4xxx_*/telemetry

Where:

  • qat_4xxx_* is the Domain:BDF of the target Intel® QAT Endpoint.

Example:

sudo ls /sys/kernel/debug/qat_4xxx_0000:76:00.0/telemetry/telemetry

Note

Update the Domain:BDF above as needed.

The telemetry feature is controlled with standard linux file commands into the control file as outlined below. The telemetry data is accessed through the device_data or rp_<X>_data file depending on what data is required.

The telemetry data for device level and ring pair level is updated each second.

Telemetry Control

Device level telemetry is enabled by echoing 1 into the control file and disabled by echoing 0. Reading the control file will tell whether the feature is currently enabled or disabled.

Ring Pair level telemetry is enabled when device level telemetry is enabled. However the ring pairs need to be selected. Only 4 ring pairs can be shown at any given time. By echoing the number of the ring pair (0-63) into a rp_<X>_data file it can be selected. Where X is A,B,C or D.

Telemetry Commands

Telemetry Commands

Operation

Command

Enable Telemetry

echo 1 > control

Disable Telemetry

echo 0 > control

Query Telemetry data

cat device_data

Select Ring Pairs

echo Num > rp_<X>_data, Num is the ring pair to be selected

Query Ring Pair data

cat rp_<X>_data

Selecting Ring Pairs

Out-Of-Tree

This section provides guidance on the mapping of ring pairs to the VFs for the PF when using the Out-Of-Tree acceleration driver.

There are 4 Ring Pairs per VF. The Ring Pairs for a PF looks like the following:

Ring Pairs

VF

Ring Pairs

1

0

1

2

3

2

4

5

6

7

3

8

9

10

11

4

12

13

14

15

5

16

17

18

19

6

20

21

22

23

7

24

25

26

27

8

28

29

30

31

9

32

33

34

35

10

36

37

38

39

11

40

41

42

43

12

44

45

46

47

13

48

49

50

51

14

52

53

54

55

15

56

57

58

59

16

60

61

62

63

The ServicesEnabled defined for the PF control the mapping of the Ring Pairs:

  • If only one workload is enabled (dc/sym/asym), the first two columns are used for this service.

  • If dc and sym or asym is enabled, the first two columns are for sym or asym and the second two columns are for dc

  • If sym and asym is enabled, the first and third columns are for asym and second and fourth columns are for sym.

Device Level Telemetry Values

Device Level Telemetry Values

Value

Meaning

sample_cnt

Message count, counter.

pci_trans_cnt

PCIe Partial Transactions, counter.

max_rd_lat

Max Read Latency, nanoseconds.

rd_lat_acc_avg

Average Read Latency, nanoseconds.

max_lat

Max Get To Put latency, nanoseconds.

lat_acc_avg

Average Get To Put latency, nanoseconds.

bw_in

PCIe write bandwidth, Mbps.

bw_out

PCIe read bandwidth, Mbps.

at_page_req_lat_acc_avg

Average Page Request Latency, nanoseconds.

at_trans_lat_acc_avg

Average Translation Latency, nanoseconds.

at_max_tlb_used

Maximum uTLB Consumed, counter.

util_cpr<x>

Compression Slice Utilization On Slice X, percentage execution cycles.

util_dcpr<x>

Decompression Slice Utilization On Slice X, percentage execution cycles.

util_xlt<x>

Translator Slice Utilization On Slice X, percentage execution cycles.

util_cph<x>

Cipher Slice Utilization On Slice X, percentage execution cycles.

util_ath<x>

Authentication Slice Utilization On Slice X, percentage execution cycles.

util_ucs<x>

UCS Slice Utilization On Slice X, percentage execution cycles.

util_pke<x>

PKE Slice Utilization On Slice X, percentage execution cycles.

Ring Pair Level Telemetry Values

Ring Pair Level Telemetry Values

Value

Meaning

sample_cnt

Message count, counter.

rp_num

Number of the ring pair returning data.

pci_trans_cnt

PCIe Partial Transactions, counter.

lat_acc_avg

Average Get To Put latency, nanoseconds.

bw_in

PCIe write bandwidth, Mbps.

bw_out

PCIe read bandwidth, Mbps.

at_glob_devtlb_hit

Descriptor DevTLB hit rate per ring, counter.

at_glob_devtlb_miss

Descriptor DevTLB miss rate per ring, counter.

tl_at_payld_devtlb_hit

Payload DevTLB hit rate per ring, counter.

tl_at_payld_devtlb_miss

Payload DevTLB miss rate per ring, counter.

Monitoring Telemetry - Text Based

The following example Python scripts highlight how telemetry data can be monitored at the command line. The script first enables telemetry service for each QAT endpoint that supports telemetry and is in the up state. It then queries the telemetry data on a periodic basis collecting the data and formatting the display.

../_images/device_utilization.png

Out-Of-Tree

Device utilization script for Out-Of-Tree driver is included here.

Important

When running script as non-root User, ensure adf_ctl is added to qat group.

sudo chgrp qat /usr/local/bin/adf_ctl

Script can be downloaded from here

Running the script looks like:

sudo monitor-qat-oot-utilization

In-Tree

Device utilization script for In-Tree driver can be downloaded from here

Running the script looks like:

sudo monitor-qat-utilization

Monitoring Telemetry Demo

Here is a demonstration of how to monitor telemetry while running the Intel® QAT sample code.