neural_compressor.experimental.export.torch2onnx

Helper functions to export model from PyTorch/TensorFlow to ONNX.

Module Contents

Functions

update_weight_bias(int8_model, fp32_onnx_path)

Update wegiht and bias of FP32 ONNX model with QAT INT8 PyTorch model .

set_data_type(dtype)

Set data type of activation and weight with string dtype.

get_node_mapping(fp32_model, fp32_onnx_path)

Get PyTorch module and ONNX node mapping.

get_quantizable_onnx_ops(int8_model, module_node_mapping)

Get quantizable onnx ops.

build_scale_mapping(fp32_onnx_path, ...)

Build scale mapping.

set_scale_info(int8_onnx_model, scale_zp_dict, ...)

Set scale to ONNX model.

recalculate_bias(int8_onnx_path, scale_zp_dict, ...)

Recalculate bias.

remove_nodes_by_name(int8_onnx_model, node_names)

Remove nodes from model by names.

sub_graph_with_int32_bias(int8_onnx_model, node, ...)

Generate a sub graph with int32 bias.

qdq_fp32_bias(int8_onnx_model, quant_format)

Excute post-process on int8 onnx model with recipe 'QDQ_OP_FP32_BIAS'.

qdq_int32_bias(int8_onnx_model, quantize_nodes, ...)

Excute post-process on int8 onnx model with recipe 'QDQ_OP_INT32_BIAS'.

qdq_fp32_bias_qdq(int8_onnx_model, quantize_nodes, ...)

Excute post-process on onnx int8 model with recipe 'QDQ_OP_FP32_BIAS_QDQ'.

torch_to_fp32_onnx(fp32_model, save_path, example_inputs)

Export FP32 PyTorch model into FP32 ONNX model.

torch_to_int8_onnx(fp32_model, int8_model, q_config, ...)

Export INT8 PyTorch model into INT8 ONNX model.

neural_compressor.experimental.export.torch2onnx.update_weight_bias(int8_model, fp32_onnx_path)

Update wegiht and bias of FP32 ONNX model with QAT INT8 PyTorch model .

Parameters:
  • int8_model (torch.nn.module) – int8 model.

  • fp32_onnx_path (str) – path to fp32 onnx model.

neural_compressor.experimental.export.torch2onnx.set_data_type(dtype)

Set data type of activation and weight with string dtype.

Parameters:

dtype (str) – data type description.

Returns:

activation type. weight_type: weight type.

Return type:

activation_type

neural_compressor.experimental.export.torch2onnx.get_node_mapping(fp32_model, fp32_onnx_path)

Get PyTorch module and ONNX node mapping.

Parameters:
  • fp32_model (torch.nn.Module) – quantization configuration from PyTorch.

  • fp32_onnx_path (str) – path to fp32 onnx model.

Returns:

op mapping from PyTorch to ONNX.

Return type:

module_node_mapping

neural_compressor.experimental.export.torch2onnx.get_quantizable_onnx_ops(int8_model, module_node_mapping)

Get quantizable onnx ops.

Parameters:
  • int8_model (torch.nn.Module) – PyTorch int8 model.

  • module_node_mapping (dict) – op mapping from PyTorch to ONNX.

Returns:

all onnx node that should be quantized.

Return type:

quantize_nodes

neural_compressor.experimental.export.torch2onnx.build_scale_mapping(fp32_onnx_path, module_node_mapping, int8_scale_info)

Build scale mapping.

Parameters:
  • fp32_onnx_path (str) – path to fp32 onnx model.

  • module_node_mapping (dict) – op mapping from PyTorch to ONNX.

  • int8_scale_info (dict) – int8 scale infomation.

Returns:

scale and zero_point dict.

Return type:

scale_zp_dict

neural_compressor.experimental.export.torch2onnx.set_scale_info(int8_onnx_model, scale_zp_dict, activation_type)

Set scale to ONNX model.

Parameters:
  • int8_onnx_path (str) – path to onnx file.

  • scale_zp_dict (dict) – scale zero_point dict.

  • activation_type – activation type.

Returns:

int8 onnx model object.

Return type:

int8_onnx_model

neural_compressor.experimental.export.torch2onnx.recalculate_bias(int8_onnx_path, scale_zp_dict, quantize_nodes, quant_format)

Recalculate bias.

Parameters:
  • int8_onnx_model (ModelProto) – onnx int8 model to process.

  • scale_zp_dict (dict) – scale zero_point dict.

  • quantize_nodes (list) – quantize nodes list.

  • quant_format (QuantFormat) – quantization format.

Returns:

processed onnx int8 model.

Return type:

int8_onnx_model

neural_compressor.experimental.export.torch2onnx.remove_nodes_by_name(int8_onnx_model, node_names)

Remove nodes from model by names.

Parameters:
  • int8_onnx_model (ModelProto) – onnx int8 model to process.

  • node_names (list) – names of nodes to remove.

Returns:

processed onnx int8 model.

Return type:

int8_onnx_model

neural_compressor.experimental.export.torch2onnx.sub_graph_with_int32_bias(int8_onnx_model, node, a_info, b_info, bias_name, output_name)

Generate a sub graph with int32 bias.

Parameters:
  • int8_onnx_model (ModelProto) – onnx int8 model to process.

  • node (NodeProto) – MatMul node belonging to nn.quantized.Linear module.

  • a_info (list) – info of input a for nn.quantized.Linear module.

  • b_info (list) – info of input b for nn.quantized.Linear module.

  • bias_name (str) – name of bias.

  • output_name (_type_) – output name of the sub graph.

Returns:

processed onnx int8 model.

Return type:

int8_onnx_model

neural_compressor.experimental.export.torch2onnx.qdq_fp32_bias(int8_onnx_model, quant_format)

Excute post-process on int8 onnx model with recipe ‘QDQ_OP_FP32_BIAS’.

Insert QDQ before quantizable op and using fp32 bias.

Parameters:
  • int8_onnx_model (ModelProto) – onnx int8 model to process.

  • quant_format (QuantFormat) – quantization format.

Returns:

processed onnx int8 model.

Return type:

int8_onnx_model

neural_compressor.experimental.export.torch2onnx.qdq_int32_bias(int8_onnx_model, quantize_nodes, quant_format)

Excute post-process on int8 onnx model with recipe ‘QDQ_OP_INT32_BIAS’.

Insert QDQ before quantizable op and using int32 bias.

Parameters:
  • int8_onnx_model (ModelProto) – onnx int8 model to process.

  • quantize_nodes (list) – quantize nodes list.

  • quant_format (QuantFormat) – quantization format.

Returns:

processed onnx int8 model.

Return type:

int8_onnx_model

neural_compressor.experimental.export.torch2onnx.qdq_fp32_bias_qdq(int8_onnx_model, quantize_nodes, quant_format)

Excute post-process on onnx int8 model with recipe ‘QDQ_OP_FP32_BIAS_QDQ’.

Insert QDQ before and after quantizable op and using fp32 bias.

Parameters:
  • int8_onnx_model (ModelProto) – onnx int8 model to process.

  • quantize_nodes (list) – quantize nodes list.

  • quant_format (QuantFormat) – quantization format.

Returns:

processed onnx int8 model.

Return type:

int8_onnx_model

neural_compressor.experimental.export.torch2onnx.torch_to_fp32_onnx(fp32_model, save_path, example_inputs, opset_version=14, dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}}, input_names=None, output_names=None, do_constant_folding=True, verbose=True)

Export FP32 PyTorch model into FP32 ONNX model.

Parameters:
  • fp32_model (torch.nn.module) – fp32 model.

  • int8_model (torch.nn.module) – int8 model.

  • save_path (str) – save path of ONNX model.

  • example_inputs (dict|list|tuple|torch.Tensor) – used to trace torch model.

  • opset_version (int, optional) – opset version. Defaults to 14.

  • dynamic_axes (dict, optional) – dynamic axes. Defaults to {“input”: {0: “batch_size”}, “output”: {0: “batch_size”}}.

  • input_names (list, optional) – input names. Defaults to None.

  • output_names (list, optional) – output names. Defaults to None.

  • do_constant_folding (bool, optional) – do constant folding or not. Defaults to True.

  • verbose (bool, optional) – dump verbose or not. Defaults to True.

neural_compressor.experimental.export.torch2onnx.torch_to_int8_onnx(fp32_model, int8_model, q_config, save_path, example_inputs, opset_version: int = 14, dynamic_axes: dict = {'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}}, input_names=None, output_names=None, quant_format: str = 'QDQ', dtype: str = 'U8S8', recipe: str = 'QDQ_OP_FP32_BIAS')

Export INT8 PyTorch model into INT8 ONNX model.

Parameters:
  • fp32_model (torch.nn.module) – fp32 model.

  • int8_model (torch.nn.module) – int8 model.

  • q_config (dict) – containing quantization configuration.

  • save_path (str) – save path of ONNX model.

  • example_inputs (dict|list|tuple|torch.Tensor) – used to trace torch model.

  • opset_version (int, optional) – opset version. Defaults to 14.

  • dynamic_axes (dict, optional) – dynamic axes. Defaults to {“input”: {0: “batch_size”}, “output”: {0: “batch_size”}}.

  • input_names (list, optional) – input names. Defaults to None.

  • output_names (list, optional) – output names. Defaults to None.

  • quant_format (str, optional) – quantization format of ONNX model. Defaults to ‘QDQ’.

  • dtype (str, optional) – data types of activation and weight of ONNX model. Defaults to ‘U8S8’.

  • recipe (str, optionl) – Recipe for processing nn.quantized.Linear module. ‘QDQ_OP_FP32_BIAS’: inserting QDQ before quantizable op and using fp32 bias. ‘QDQ_OP_INT32_BIAS’: inserting QDQ before quantizable op and using int32 bias. ‘QDQ_OP_FP32_BIAS_QDQ’: inserting QDQ before and after quantizable op and using fp32 bias. Defaults to ‘QDQ_OP_FP32_BIAS’.