neural_compressor.utils.export.torch2onnx

Helper functions to export model from PyTorch/TensorFlow to ONNX.

Functions

get_node_mapping(fp32_model, fp32_onnx_path)

Get PyTorch module and ONNX node mapping.

get_quantizable_onnx_ops(int8_model, module_node_mapping)

Get quantizable onnx ops.

dynamic_quant_export(pt_fp32_model, pt_int8_model, ...)

Export dynamic quantized model.

static_quant_export(pt_int8_model, save_path, ...)

Export static quantized model.

torch_to_fp32_onnx(pt_fp32_model, save_path, ...[, ...])

Export FP32 PyTorch model into FP32 ONNX model.

torch_to_int8_onnx(pt_fp32_model, pt_int8_model, ...)

Export INT8 PyTorch model into INT8 ONNX model.

Module Contents

neural_compressor.utils.export.torch2onnx.get_node_mapping(fp32_model, fp32_onnx_path)[source]

Get PyTorch module and ONNX node mapping.

Parameters:
  • fp32_model (torch.nn.Module) – quantization configuration from PyTorch.

  • fp32_onnx_path (str) – path to fp32 onnx model.

Returns:

op mapping from PyTorch to ONNX.

Return type:

module_node_mapping

neural_compressor.utils.export.torch2onnx.get_quantizable_onnx_ops(int8_model, module_node_mapping)[source]

Get quantizable onnx ops.

Parameters:
  • int8_model (torch.nn.Module) – PyTorch int8 model.

  • module_node_mapping (dict) – op mapping from PyTorch to ONNX.

Returns:

all onnx node that should be quantized.

Return type:

quantize_nodes

neural_compressor.utils.export.torch2onnx.dynamic_quant_export(pt_fp32_model, pt_int8_model, save_path, example_inputs, q_config, opset_version, dynamic_axes, input_names, output_names, weight_type)[source]

Export dynamic quantized model.

Parameters:
  • pt_fp32_model (torch.nn.module) – PyTorch FP32 model.

  • pt_int8_model (torch.nn.module) – PyTorch INT8 model.

  • save_path (str) – save path of ONNX model.

  • example_inputs (dict|list|tuple|torch.Tensor) – used to trace torch model.

  • q_config (dict) – containing quantization configuration.

  • opset_version (int, optional) – opset version. Defaults to 14.

  • dynamic_axes (dict, optional) – dynamic axes. Defaults to {“input”: {0: “batch_size”}, “output”: {0: “batch_size”}}.

  • input_names (dict, optional) – input names. Defaults to None.

  • output_names (dict, optional) – output names. Defaults to None.

  • weight_type (str, optional) – data types of weight of ONNX model (only needed for exporting dynamic quantized model). Defaults to ‘S8’.

neural_compressor.utils.export.torch2onnx.static_quant_export(pt_int8_model, save_path, example_inputs, q_config, opset_version, dynamic_axes, input_names, output_names, quant_format)[source]

Export static quantized model.

Parameters:
  • pt_int8_model (torch.nn.module) – PyTorch INT8 model.

  • save_path (str) – save path of ONNX model.

  • example_inputs (dict|list|tuple|torch.Tensor) – used to trace torch model.

  • q_config (dict) – containing quantization configuration.

  • opset_version (int, optional) – opset version. Defaults to 14.

  • dynamic_axes (dict, optional) – dynamic axes. Defaults to {“input”: {0: “batch_size”}, “output”: {0: “batch_size”}}.

  • input_names (dict, optional) – input names. Defaults to None.

  • output_names (dict, optional) – output names. Defaults to None.

  • quant_format (str, optional) – _quantization format of ONNX model. Defaults to ‘QDQ’.

neural_compressor.utils.export.torch2onnx.torch_to_fp32_onnx(pt_fp32_model, save_path, example_inputs, opset_version=14, dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}}, input_names=None, output_names=None, do_constant_folding=True, verbose=True)[source]

Export FP32 PyTorch model into FP32 ONNX model.

Parameters:
  • pt_fp32_model (torch.nn.module) – PyTorch FP32 model.

  • save_path (str) – save path of ONNX model.

  • example_inputs (dict|list|tuple|torch.Tensor) – used to trace torch model.

  • opset_version (int, optional) – opset version. Defaults to 14.

  • dynamic_axes (dict, optional) – dynamic axes. Defaults to {“input”: {0: “batch_size”}, “output”: {0: “batch_size”}}.

  • input_names (dict, optional) – input names. Defaults to None.

  • output_names (dict, optional) – output names. Defaults to None.

  • do_constant_folding (bool, optional) – do constant folding or not. Defaults to True.

  • verbose (bool, optional) – dump verbose or not. Defaults to True.

neural_compressor.utils.export.torch2onnx.torch_to_int8_onnx(pt_fp32_model, pt_int8_model, save_path, example_inputs, q_config, opset_version=14, dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}}, input_names=None, output_names=None, quant_format: str = 'QDQ', weight_type: str = 'S8', verbose=True)[source]

Export INT8 PyTorch model into INT8 ONNX model.

Parameters:
  • pt_fp32_model (torch.nn.module) – PyTorch FP32 model.

  • pt_int8_model (torch.nn.module) – PyTorch INT8 model.

  • save_path (str) – save path of ONNX model.

  • example_inputs (dict|list|tuple|torch.Tensor) – used to trace torch model.

  • q_config (dict) – containing quantization configuration.

  • opset_version (int, optional) – opset version. Defaults to 14.

  • dynamic_axes (dict, optional) – dynamic axes. Defaults to {“input”: {0: “batch_size”}, “output”: {0: “batch_size”}}.

  • input_names (dict, optional) – input names. Defaults to None.

  • output_names (dict, optional) – output names. Defaults to None.

  • quant_format (str, optional) – _quantization format of ONNX model. Defaults to ‘QDQ’.

  • weight_type (str, optional) – data types of weight of ONNX model (only needed for exporting dynamic quantized model). Defaults to ‘S8’.

  • verbose (bool, optional) – dump verbose or not. Defaults to True.