# Calibration Algorithms in Quantization 1. [Introduction](#introduction) 2. [Calibration Algorithms](#calibration-algorithms) 3. [Support Matrix](#support-matrix) ## Introduction Quantization proves beneficial in terms of reducing the memory and computational requirements of the model. Uniform quantization transforms the input value $x ∈ [β, α]$ to lie within $[−2^{b−1}, 2^{b−1} − 1]$, where $[β, α]$ is the range of real values chosen for quantization and $b$ is the bit-width of the signed integer representation. Calibration is the process of determining the $α$ and $β$ for model weights and activations. Refer to this [link](https://github.com/intel/neural-compressor/blob/master/docs/source/quantization.html#quantization-fundamentals) for more quantization fundamentals ## Calibration Algorithms Currently, Intel® Neural Compressor supports three popular calibration algorithms: - MinMax: This method gets the maximum and minimum of input values as $α$ and $β$ [^1]. It preserves the entire range and is the simplest approach. - Entropy: This method minimizes the KL divergence to reduce the information loss between full-precision and quantized data [^2]. Its primary focus is on preserving essential information. - Percentile: This method only considers a specific percentage of values for calculating the range, ignoring the remainder which may contain outliers [^3]. It enhances resolution by excluding extreme values but still retaining noteworthy data. ## Support Matrix
Framework | Supported calibration algorithm | |
---|---|---|
weight | activation | |
Pytorch | minmax | minmax, kl |
Tensorflow | minmax | minmax, kl |
MXNet | minmax | minmax, kl |
OnnxRuntime | minmax | minmax, kl, percentile |