neural_compressor.algorithm.weight_correction

Build FastBiasCorrection algorithm class.

Module Contents

Classes

WeightCorrection

FastBiasCorrection algorithm class.

class neural_compressor.algorithm.weight_correction.WeightCorrection(eps=1e-05, channel_axis=1)[source]

FastBiasCorrection algorithm class.

Correct INT8 weight distribution close to FP32 weight r*(W_int8 + u) -> W_fp32, r is variance ratio between FP32 and INT8 u is the difference between FP32 and INT8 channel wise, it’s equal to minimize: round(scale_c * (W_fp32 + shift))/scale - r*(round(scale * W_fp32) + scale*u)/scale notice we can only change the first round: round(scale_c * (W_fp32 + shift)) an empirical solution is to make: scale_c = r * scale and shift = u with this we don’t change the min/max value, and correct the weight.