Deflate with Indexing

Deflate with Indexing#

The key to this feature is the concept of “mini-blocks”. These are fixed-size regions of the uncompressed data that can be decompressed independently. The size of the mini-blocks can be set at powers of 2 from 512 through 32768. This is also called the “index-size”.

A compressed Deflate stream is composed of a series of Deflate blocks. The indexing feature requires that all blocks except for the last one to be a multiple of the mini-block size. For example, one could have 1 KB mini-blocks and 32 KB blocks, or 2 KB mini-blocks and put the entire input into a single Deflate block.

The significance of the mini-blocks is the following:

  • no references from a given mini-block to any previous one

  • no matches that span mini-block boundaries

This means that each mini-block can be decompressed independently of any other mini-block.

The size of the mini-block becomes the granularity at which “random access” can be done. To get one particular uncompressed byte, the user needs to decompress the entire mini-block containing that byte (the user could decompress that mini-block up to and including the needed data, but then the user would lose the CRC check). So, on one hand, the smaller the mini-block, the less work is needed to access any particular byte. On the other hand, the smaller the mini-blocks become, the worse the compression ratio becomes. So an application needs to make a reasonable trade-off when choosing the size of the mini-blocks.

Warning

The implementation of indexing is in progress.