Checksum Discussion

Intel® QAT computes Cyclic Redundancy Checks (CRCs), checksums, and hash values during compression and decompression operations. For brevity, these algorithms will collectively be referred to as checksums in this paper.

This section introduces the checksums used in CnV, explains the concept of checksum collisions, outlines how each checksum is calculated, and highlights their differences.

Table of Checksums

The following table identifies the checksums used in CnV, based on the compression algorithm.

Compression Algorithm

CnV Checksums

Notes

Reference

(as of 9/19/2024)

Deflate

Adler32 and CRC32

Default polynomial for CRC32

Deflate Specification

https://www.rfc-editor.org/info/rfc1951

LZ4

XXHash32 and CRC32

Default polynomial for CRC32

lz4_Block_format

https://github.com/lz4/lz4/blob/dev/doc/lz4_Block_format.md

LZ4s

XXHash32 and CRC32

Default polynomial for CRC32

LZ4s Compress Block Format

https://intel.github.io/quickassist/PG/services_compression_api.html#lz4s-compressed-data-block-format

Note

When compression or decompression requests are submitted to an Intel® QAT device with the End-to-End CRC feature enabled, input and output programmable CRC64 values are included in the response descriptor and may be verified by the application. These CRC64 values are not included in the CnV calculation.

See section The CRC64 Values Used in End-to-End Mode for details.

Collisions CRC32, Adler32 and XXHash32

Any programmatic operation that reduces a dataset to a smaller value has the potential to result in a collision.

CRC32 Collision Examples

This table shows CRC32 collisions for four different input datasets, each with two different lengths.

Input	Dataset Length	CRC32
0x61, 0x62, 0x63, 0x64, 0x65 (‘a’,’b’,’c’,’d’,’e’)	5 octets	0 x43a52f78
0x00, 0x00, 0x00, 0x00, 0x75, 0x97, 0x13, 0xa3	8 octets	0 x43a52f78
0x00, 0x00, 0x00, 0x01, 0xae, 0xe6, 0x15, 0xe2	8 octets	0 x43a52f78
0x00, 0x00, 0x00, 0x02, 0x18, 0x04, 0x19, 0x60	8 octets	0 x43a52f78

This is referred to as a collision. Empirical evidence shows there is a small non-zero likelihood of checksum collisions when minor changes are made in the dataset. See section The Empirical Study for details.

Anatomy of CRC32 checksum

CRC32 (Cyclic Redundancy Check) is a popular checksum algorithm used to detect errors in data.

CRC32 checksums are calculated using polynomial division. This is implemented by use of the XOR operation and shift values. See Footnote 1 for details.

It is composed by three steps:

Division Operation

The CRC calculation involves binary division, where the data ( a sequence of bits) is divided by the polynomial using XOR operations instead of traditional subtraction.

Here’s a high-level view of how it works:
- Initial Value: Start with an initial CRC value, usually set to 0xFFFFFFFF (all bits set to 1).
- Data Preparation: Append 32 zero bits to the end of the data to accommodate the CRC bits.
- Division Process: For each bit of data, if the leftmost bit of the data (when XORed with the polynomial) is 1, the polynomial is XORed with the data. This step mirrors polynomial long division but is implemented using binary operations.
XOR Operations

XOR is the primary operation in CRC calculations. When you encounter a non-zero bit in the polynomial, you perform an XOR operation between the data and the polynomial at the relevant bit positions. XOR is used because it has properties that align with the algebraic structure required for CRC calculations.
Final XOR

After processing all the bits, the resulting CRC value is XORed with 0xFFFFFFFF (again, all bits set). This step provides the final CRC32 checksum.

Example Calculation

Here’s a simplified example to illustrate:

Data: Let’s say our data to be checked is 110101.
Polynomial: The CRC polynomial for CRC32 is represented as 10000010011000001000100000000001 in binary.
Initial CRC Value: 0xFFFFFFFF.
Division: Append 32 zeros to the end of the data. Perform the XOR operations as you shift the data to the left, aligning it with the polynomial, and reduce the polynomial if necessary.
Final XOR: XOR the final result with 0xFFFFFFFF.

The result from this final XOR operation is the CRC32 checksum, which is a 32-bit value.

Summary

The CRC32 algorithm combines binary polynomial division and XOR operations to compute a 32-bit checksum for a given data block. This checksum helps detect errors by ensuring that data has not been altered.

Collision Rate CRC32

Based on the dataset used in an empirical study of collisions (See The Empirical Study section for details.), CRC32 collisions can be expected to occur at a rate of approximately 0.2 per billion tests. This indicates that, relative to a completely random dataset, a modest reduction in CRC32 collisions is observed due to the nature of the dataset.

Anatomy of Adler32 checksum

The Adler32 algorithm uses modulo addition of individual octets of the datasets as described below:

For a given dataset DS that requires an Adler32 checksum, two sixteen-bit values A and B are created. Concatenated together, these two values comprise the Adler32 checksum.

At the start of the calculation phase, A is initialized to 1 and B is initialized to 0. The sums calculated modulo 65521, which is the largest prime number smaller than 2E16 (65,536).

The A value is the sum of all the values in the dataset + 1. The B value is the sum of all of the A values for each step.

This can be represented as:
A = (1 + DS1 + DS2 + DS3 + … + DSn )(mod 65521)
B = (1 + DS1) + ( 1+ DS1 + DS2 ) + …+(1+DS1 + DS2 + … + DSn)(mod 65521)
Adler32 of DS = B || A

Collision Rate Adler32

Based on the dataset used in an empirical study of collisions (See section The Empirical Study for details.), Adler32 collisions can be expected to occur at a rate of approximately 175 collisions per billion tests. It is concluded the dataset chosen has significantly higher levels of checksum collision for Alder32 checksums compared to random data, especially in small datasets

Anatomy of XXHash32

XXHash32 creates the hash with a series of rotates, addition, multiplication and XOR operations. See Footnote 2 for details.

XXHash32 sets up four accumulators, which are based on an optional seed and two prime numbers.

u32 acc1 = seed + PRIME32_1 + PRIME32_2;
u32 acc2 = seed + PRIME32_2;
u32 acc3 = seed + 0;
u32 acc4 = seed - PRIME32_1;

The dataset is processed sixteen bytes at a time—four bytes are mixed into each of the four accumulators, as shown below. Note that the “<<<” symbol below represents a left rotate operation.

accN = accN + (laneN * PRIME32_2);
accN = accN <<< 13;
accN = accN * PRIME32_1;

After the completion of the sixteen byte sections, the accumulators are converged:

acc = (acc1 <<< 1) + (acc2 <<< 7) + (acc3 <<< 12) + (acc4 <<< 18);
acc = acc + (u32)inputLength;

Any remaining bytes (up to 15) are mixed is shown :

while (remainingLength >= 4) {
   lane = read_32bit_little_endian(input_ptr);
   acc = acc + lane * PRIME32_3;
   acc = (acc <<< 17) * PRIME32_4;
   input_ptr += 4; remainingLength -= 4;
}

while (remainingLength >= 1) {
   lane = read_byte(input_ptr);
   acc = acc + lane * PRIME32_5;
   acc = (acc <<< 11) * PRIME32_1;
   input_ptr += 1; remainingLength -= 1;
}

And a final mix is performed:

acc = acc xor (acc >> 15);
acc = acc * PRIME32_2;
acc = acc xor (acc >> 13);
acc = acc * PRIME32_3;
acc = acc xor (acc >> 16);

Collision Rate XXHash32

Based on the dataset used in an empirical study of collisions (See section The Empirical Study for details.), XXHash32 collisions are expected to occur at a rate of approximately 3.4 errors per billion tests. This suggests a modest reduction when compared to CRC32 collisions, which can be attributed to the nature of the dataset.