TensorFlow Quantization
===============

1. [Introduction](#introduction)
2. [Usage](#usage)  
   2.1 [Without Accuracy Aware Tuning](#without-accuracy-aware-tuning)   
   2.2 [With Accuracy Aware Tuning](#with-accuracy-aware-tuning)   
   2.3 [Specify Quantization Rules](#specify-quantization-rules) 
3. [Examples](#examples) 

## Introduction

`neural_compressor.tensorflow` supports quantizing both TensorFlow and Keras model with or without accuracy aware tuning.

For the detailed quantization fundamentals, please refer to the document for [Quantization](quantization.html).


## Get Started


### Without Accuracy Aware Tuning


This means user could leverage Intel(R) Neural Compressor to directly generate a fully quantized model without accuracy aware tuning. It's user responsibility to ensure the accuracy of the quantized model meets expectation.

``` python
# main.py

# Original code
model = tf.keras.applications.resnet50.ResNet50(weights="imagenet")
val_dataset = ...
val_dataloader = MyDataloader(dataset=val_dataset)

# Quantization code
from neural_compressor.tensorflow import quantize_model, StaticQuantConfig

quant_config = StaticQuantConfig()
qmodel = quantize_model(
    model=model,
    quant_config=quant_config,
    calib_dataloader=val_dataloader,
)
qmodel.save("./output")
```

### With Accuracy Aware Tuning

This means user could leverage the advance feature of Intel(R) Neural Compressor to tune out a best quantized model which has best accuracy and good performance. User should provide `eval_fn` and `eval_args`.

``` python
# main.py

# Original code
model = tf.keras.applications.resnet50.ResNet50(weights="imagenet")
val_dataset = ...
val_dataloader = MyDataloader(dataset=val_dataset)


def eval_acc_fn(model) -> float:
    ...
    return acc


# Quantization code
from neural_compressor.common.base_tuning import TuningConfig
from neural_compressor.tensorflow import autotune

# it's also supported to define custom_tune_config as:
# TuningConfig(StaticQuantConfig(weight_sym=[True, False], act_sym=[True, False]))
custom_tune_config = TuningConfig(
    config_set=[
        StaticQuantConfig(weight_sym=True, act_sym=True),
        StaticQuantConfig(weight_sym=False, act_sym=False),
    ]
)
best_model = autotune(
    model=model,
    tune_config=custom_tune_config,
    eval_fn=eval_acc_fn,
    calib_dataloader=val_dataloader,
)
best_model.save("./output")
```

### Specify Quantization Rules
Intel(R) Neural Compressor support specify quantization rules by operator name or operator type. Users can set `local` in dict or use `set_local` method of config class to achieve the above purpose.

1. Example of setting `local` from a dict
```python
quant_config = {
    "static_quant": {
        "global": {
            "weight_dtype": "int8",
            "weight_sym": True,
            "weight_granularity": "per_tensor",
            "act_dtype": "int8",
            "act_sym": True,
            "act_granularity": "per_tensor",
        },
        "local": {
            "conv1": {
                "weight_dtype": "fp32",
                "act_dtype": "fp32",
            }
        },
    }
}
config = StaticQuantConfig.from_dict(quant_config)
```
2. Example of using `set_local`
```python
quant_config = StaticQuantConfig()
conv2d_config = StaticQuantConfig(
    weight_dtype="fp32",
    act_dtype="fp32",
)
quant_config.set_local("conv1", conv2d_config)
```

## Examples

Users can also refer to [examples](https://github.com/intel/neural-compressor/blob/master/examples/3.x_api/tensorflow) on how to quantize a TensorFlow model with `neural_compressor.tensorflow`.