Parser

Parser#

The “parser” is responsible for converting a series of bytes into a series of unsigned integers. The “parser” is defined by an enumeration qpl_parser in the field qpl_job.parser.

The default value qpl_parser.qpl_p_le_packed_array views the input buffer as a little-endian packed array of N-bit integers, where N is given by src1_bit_width. For example, if N=3, then the first element will be bits 2:0 in the first byte, the second element will be bits 5:3, etc.

If the parser is qpl_parser.qpl_p_be_packed_array, the buffer is viewed as a big-endian packed array. For example, with N=3, the first element will be bits 7:5 of the first byte, the second element will be bits 4:2, etc.

If the parser is specified as qpl_parser.qpl_p_parquet_rle, it is viewed as being in Parquet RLE format. In this case, the bit width is given in the data stream, so qpl_job.src1_bit_width must be set to 0.

Source-2 can only be “parsed” as a packed-array. The default parser views the source-2 data a little-endian packed array. If the QPL_FLAG_SRC2_BE flag is specified, then it is viewed as a big-endian packed array.

Parquet RLE Format#

The input is in the Parquet RLE format. The first byte of the data stream gives the bit width. This is followed by the encoded data. The bit-width cannot exceed 32-bits.

The format is:

parquet-rle: <bit-width> <encoded-data>
bit-width := bit-width of data stored as one byte
encoded-data := <run>*
run := <bit-packed-run> | <rle-run>
bit-packed-run := <bit-packed-header> <bit-packed-values>
bit-packed-header := varint-encode(<bit-pack-count> << 1 | 1)
   // we always bit-pack a multiple of 8 values at a time, so we only store the number of values / 8
bit-pack-count := (number of values in this run) / 8
bit-packed-values := data stored as a packed array of bit-width values
rle-run := <rle-header> <repeated-value>
rle-header := varint-encode( (number of times repeated) << 1)
repeated-value := value that is repeated, using a fixed-width of round-up-to-next-byte(bit-width)

varint := if((byte & 0x80) > 0) than the first bits are (byte & 0x7F), read
next byte until number of read bytes = 4, or ((byte & 0x80) == 0). Then all
obtained bits are connected sequentially - therefore the 1st bit of the
second byte must go to the 7th position of the resulting unsigned integer.

Attention

The standard varint can consist of 5 encoded bytes. In the Intel® Query Processing Library (Intel® QPL), it is limited by 4.