neural_compressor.experimental.quantization

Neural Compressor Quantization API.

Module Contents

Classes

Quantization

This class provides easy use API for quantization.

class neural_compressor.experimental.quantization.Quantization(conf_fname_or_obj=None)[source]

This class provides easy use API for quantization.

It automatically searches for optimal quantization recipes for low precision model inference, achieving best tuning objectives like inference performance within accuracy loss constraints. Tuner abstracts out the differences of quantization APIs across various DL frameworks and brings a unified API for automatic quantization that works on frameworks including tensorflow, pytorch and mxnet. Since DL use cases vary in the accuracy metrics (Top-1, MAP, ROC etc.), loss criteria (<1% or <0.1% etc.) and tuning objectives (performance, memory footprint etc.). Tuner class provides a flexible configuration interface via YAML for users to specify these parameters.

Parameters:

conf_fname_or_obj (string or obj) – The path to the YAML configuration file or QuantConf class containing accuracy goal, tuning objective and preferred calibration & quantization tuning space etc.