Diagnosis

Diagnosis Introduction

The diagnosis feature provides methods to debug the accuracy loss during quantization and profile the performance gap during benchmark. There are 2 ways to diagnose a model with Intel® Neural Compressor. First is non-GUI mode that is described below and second is GUI mode with Neural Insights component.

The workflow is described in the diagram below. First we have to configure scripts with diagnosis, then run them and check diagnosis info in the terminal. Test if the result is satisfying and repeat the steps if needed.

Supported Feature Matrix

Types	Diagnosis data	Framework	Backend
Post-Training Static Quantization (PTQ)	weights and activations	TensorFlow	TensorFlow/Intel TensorFlow
Post-Training Static Quantization (PTQ)	weights and activations	ONNX Runtime	QLinearops/QDQ
Benchmark Profiling	OP execute duration	TensorFlow	TensorFlow/Intel TensorFlow
Benchmark Profiling	OP execute duration	ONNX Runtime	QLinearops/QDQ

Get Started

Install Intel® Neural Compressor

First you need to install Intel® Neural Compressor.

git clone https://github.com/intel/neural-compressor.git
cd neural-compressor 
pip install -r requirements.txt 
python setup.py install

Modify script

Modify quantization/benchmark script to run diagnosis by adding argument diagnosis set to True to PostTrainingQuantConfig/BenchmarkConfig as shown below.

Quantization diagnosis

config = PostTrainingQuantConfig(diagnosis=True, ...)

Benchmark diagnosis

config = BenchmarkConfig(diagnosis=True, ...)

Example

Below it is explained how to run diagnosis for ONNX ResNet50 model.

Prepare dataset

Download dataset ILSVR2012 validation Imagenet dataset.

Download label:

wget http://dl.caffe.berkeleyvision.org/caffe_ilsvrc12.tar.gz
tar -xvzf caffe_ilsvrc12.tar.gz val.txt

Run quantization script

Then execute script with quantization API in another terminal with –diagnose flag.

python examples/onnxrt/image_recognition/resnet50_torchvision/quantization/ptq_static/main.py \
  --model_path=/path/to/resnet50_v1.onnx/ \
  --dataset_location=/path/to/ImageNet/ \
  --label_path=/path/to/val.txt/
  --tune 
  --diagnose 

Run benchmark script

To run profiling execute script with parameters shown in the command below.

python examples/onnxrt/image_recognition/resnet50_torchvision/quantization/ptq_static/main.py \
  --model_path=/path/to/resnet50_v1.onnx/ \
  --dataset_location=/path/to/ImageNet/ \
  --label_path=/path/to/val.txt/
  --mode=performance \​
  --benchmark \​
  --diagnose

See quantization data

After script’s execution you will see the results in your terminal. In the activations summary you can see a table with OP name, MSE (mean squared error), activation minimum and maximum sorted by MSE.

activations

In the weights summary table there are parameters like minimum, maximum, mean, standard deviation and variance for input model. The table is also sorted by MSE.

weights

How to do diagnosis

Neural Compressor diagnosis mode provides weights and activation data that includes several useful metrics for diagnosing potential losses of model accuracy.

Parameter description

Data is presented in the terminal in form of table where each row describes single OP in the model. We present such quantities measures like:

MSE - Mean Squared Error - it is a metric that measures how big is the difference between input and optimized model’s weights for specific OP.

$$ MSE = \sum_{i=1}^{n}(x_i-y_i)^2 $$

Input model min - minimum value of the input OP tensor data

$$ \min{\vec{x}} $$

Input model max - maximum value of the input OP tensor data

$$ \max{\vec{x}} $$

Input model mean - mean value of the input OP tensor data

$$ \mu =\frac{1}{n} \sum_{i=1}^{n} x_{i} $$

Input model standard deviation - standard deviation of the input OP tensor data

$$ \sigma =\sqrt{\frac{1}{n}\sum\limits_{i=1}^n (x_i - \mu)} $$

Input model variance - variance of the input OP tensor data

$$ Var = \sigma^2 $$

where,
$x_i$ - input OP tensor data,
$y_i$ - optimized OP tensor data,
$\mu_x$ - input model mean,
$\sigma_x$ - input model variance

Diagnosis suggestions

Check the nodes with MSE order. High MSE usually means higher possibility of accuracy loss happened during the quantization, so fallback those Ops may get some accuracy back.
Check the Min-Max data range. An dispersed data range usually means higher accuracy loss, so we can also try to full back those Ops.
Check with the other data and find some outliers, and try to fallback some Ops and test for the quantization accuracy.

Note: We can’t always trust the debug rules, it’s only a reference, sometimes the accuracy regression is hard to explain.

Fallback setting example

from neural_compressor import quantization, PostTrainingQuantConfig

op_name_dict = {"v0/cg/conv0/conv2d/Conv2D": {"activation": {"dtype": ["fp32"]}}}
config = PostTrainingQuantConfig(
    diagnosis=True,
    op_name_dict=op_name_dict,
)
q_model = quantization.fit(
    model,
    config,
    calib_dataloader=dataloader,
    eval_func=eval,
)

See profiling data

In profiling section there is a table with nodes sorted by total execution time. It is possible to check which operations take the most time.

profiling