Export to ONNX

  1. Introduction

  2. Supported Model Export Matrix

  3. Examples

    3.1. Export to FP32 ONNX Model

    3.2. Export to BF16 ONNX Model

    3.3. Export to INT8 ONNX Model

Introduction

We support exporting PyTorch models into ONNX models with our well-designed API trainer.export_to_onnx. Users can get FP32 (Float precision 32 bit), BF16 (Bfloat 16 bit) and INT8 (Integer 8 bit) ONNX model with the same interface.

Supported Model Export Matrix

Input Model Export FP32 Export BF16 Export INT8
FP32 PyTorch Model /
INT8 PyTorch Model
(dynamic)
/ /
INT8 PyTorch Model
(static)
/ /
INT8 PyTorch Model
(qat)
/ /

Examples

Export to FP32 ONNX Model

If export_to_onnx is called before quantization, we will fetch the FP32 model and export it into a ONNX model.

trainer.export_to_onnx(
    save_path=None, 
    [opset_version=14,]
    [do_constant_folding=True,]
    [verbose=True,]
)

Export to BF16 ONNX Model

If the flag: enable_bf16 is True, you will get an ONNX model with BFloat16 weights for [’MatMul’, ‘Gemm’] node type. This FP32 + BF16 ONNX model can be accelerated by our executor backend.

API usage

trainer.enable_bf16 = True
trainer.export_to_onnx(
    save_path=None, 
    [opset_version=14,]
    [do_constant_folding=True,]
    [verbose=True,]
)

Export to INT8 ONNX Model

If export_to_onnx is called after quantization, we will fetch the FP32 PyTorch model, convert it into ONNX model and do onnxruntime quantization based on pytorch quantization configuration.

trainer.export_to_onnx(
    save_path=None,
    [quant_format='QDQ'/'Qlinear',]
    [dtype='S8S8'/'U8S8'/'U8U8',]
    [opset_version=14,]
)

For executor backend

Our executor backend provides highly optimized performance for INT8 MatMul node type and U8S8 datatype. Therefore, we suggest users to enable the flag enable_executor before export int8 ONNX model for executor backend.

trainer.enable_executor = True