TensorFlow Quantization
Introduction
neural_compressor.tensorflow
supports quantizing both TensorFlow and Keras model with or without accuracy aware tuning.
For the detailed quantization fundamentals, please refer to the document for Quantization.
Get Started
Without Accuracy Aware Tuning
This means user could leverage Intel(R) Neural Compressor to directly generate a fully quantized model without accuracy aware tuning. It’s user responsibility to ensure the accuracy of the quantized model meets expectation.
# main.py
# Original code
model = tf.keras.applications.resnet50.ResNet50(weights="imagenet")
val_dataset = ...
val_dataloader = MyDataloader(dataset=val_dataset)
# Quantization code
from neural_compressor.tensorflow import quantize_model, StaticQuantConfig
quant_config = StaticQuantConfig()
qmodel = quantize_model(
model=model,
quant_config=quant_config,
calib_dataloader=val_dataloader,
)
qmodel.save("./output")
With Accuracy Aware Tuning
This means user could leverage the advance feature of Intel(R) Neural Compressor to tune out a best quantized model which has best accuracy and good performance. User should provide eval_fn
and eval_args
.
# main.py
# Original code
model = tf.keras.applications.resnet50.ResNet50(weights="imagenet")
val_dataset = ...
val_dataloader = MyDataloader(dataset=val_dataset)
def eval_acc_fn(model) -> float:
...
return acc
# Quantization code
from neural_compressor.common.base_tuning import TuningConfig
from neural_compressor.tensorflow import autotune
# it's also supported to define custom_tune_config as:
# TuningConfig(StaticQuantConfig(weight_sym=[True, False], act_sym=[True, False]))
custom_tune_config = TuningConfig(
config_set=[
StaticQuantConfig(weight_sym=True, act_sym=True),
StaticQuantConfig(weight_sym=False, act_sym=False),
]
)
best_model = autotune(
model=model,
tune_config=custom_tune_config,
eval_fn=eval_acc_fn,
calib_dataloader=val_dataloader,
)
best_model.save("./output")
Specify Quantization Rules
Intel(R) Neural Compressor support specify quantization rules by operator name or operator type. Users can set local
in dict or use set_local
method of config class to achieve the above purpose.
Example of setting
local
from a dict
quant_config = {
"static_quant": {
"global": {
"weight_dtype": "int8",
"weight_sym": True,
"weight_granularity": "per_tensor",
"act_dtype": "int8",
"act_sym": True,
"act_granularity": "per_tensor",
},
"local": {
"conv1": {
"weight_dtype": "fp32",
"act_dtype": "fp32",
}
},
}
}
config = StaticQuantConfig.from_dict(quant_config)
Example of using
set_local
quant_config = StaticQuantConfig()
conv2d_config = StaticQuantConfig(
weight_dtype="fp32",
act_dtype="fp32",
)
quant_config.set_local("conv1", conv2d_config)
Examples
Users can also refer to examples on how to quantize a TensorFlow model with neural_compressor.tensorflow
.