neural_compressor.experimental.export
¶
Intel Neural Compressor Export.
Submodules¶
Package Contents¶
Functions¶
|
Export FP32 PyTorch model into FP32 ONNX model. |
|
Export INT8 PyTorch model into INT8 ONNX model. |
|
Export ONNX QLinearops model into QDQ model. |
- neural_compressor.experimental.export.torch_to_fp32_onnx(fp32_model, save_path, example_inputs, opset_version=14, dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}}, input_names=None, output_names=None, do_constant_folding=True, verbose=True)¶
Export FP32 PyTorch model into FP32 ONNX model.
- Parameters:
fp32_model (torch.nn.module) – fp32 model.
int8_model (torch.nn.module) – int8 model.
save_path (str) – save path of ONNX model.
example_inputs (dict|list|tuple|torch.Tensor) – used to trace torch model.
opset_version (int, optional) – opset version. Defaults to 14.
dynamic_axes (dict, optional) – dynamic axes. Defaults to {“input”: {0: “batch_size”}, “output”: {0: “batch_size”}}.
input_names (list, optional) – input names. Defaults to None.
output_names (list, optional) – output names. Defaults to None.
do_constant_folding (bool, optional) – do constant folding or not. Defaults to True.
verbose (bool, optional) – dump verbose or not. Defaults to True.
- neural_compressor.experimental.export.torch_to_int8_onnx(fp32_model, int8_model, q_config, save_path, example_inputs, opset_version: int = 14, dynamic_axes: dict = {'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}}, input_names=None, output_names=None, quant_format: str = 'QDQ', dtype: str = 'U8S8', recipe: str = 'QDQ_OP_FP32_BIAS')¶
Export INT8 PyTorch model into INT8 ONNX model.
- Parameters:
fp32_model (torch.nn.module) – fp32 model.
int8_model (torch.nn.module) – int8 model.
q_config (dict) – containing quantization configuration.
save_path (str) – save path of ONNX model.
example_inputs (dict|list|tuple|torch.Tensor) – used to trace torch model.
opset_version (int, optional) – opset version. Defaults to 14.
dynamic_axes (dict, optional) – dynamic axes. Defaults to {“input”: {0: “batch_size”}, “output”: {0: “batch_size”}}.
input_names (list, optional) – input names. Defaults to None.
output_names (list, optional) – output names. Defaults to None.
quant_format (str, optional) – quantization format of ONNX model. Defaults to ‘QDQ’.
dtype (str, optional) – data types of activation and weight of ONNX model. Defaults to ‘U8S8’.
recipe (str, optionl) – Recipe for processing nn.quantized.Linear module. ‘QDQ_OP_FP32_BIAS’: inserting QDQ before quantizable op and using fp32 bias. ‘QDQ_OP_INT32_BIAS’: inserting QDQ before quantizable op and using int32 bias. ‘QDQ_OP_FP32_BIAS_QDQ’: inserting QDQ before and after quantizable op and using fp32 bias. Defaults to ‘QDQ_OP_FP32_BIAS’.
- neural_compressor.experimental.export.onnx_qlinear_to_qdq(model, input_name_to_nodes)¶
Export ONNX QLinearops model into QDQ model.
- Parameters:
model (ModelProto) – int8 onnx model.
input_name_to_nodes (dict) – the mapping of tensor name and its destination nodes.