Export to ONNX

Introduction
Supported Model Export Matrix
Examples

3.1. Export to FP32 ONNX Model

3.2. Export to BF16 ONNX Model

3.3. Export to INT8 ONNX Model

Introduction

We support exporting PyTorch models into ONNX models with our well-designed API trainer.export_to_onnx. Users can get FP32 (Float precision 32 bit), BF16 (Bfloat 16 bit) and INT8 (Integer 8 bit) ONNX model with the same interface.

Supported Model Export Matrix

Input Model	Export FP32	Export BF16	Export INT8
FP32 PyTorch Model	✔	✔	/
INT8 PyTorch Model (dynamic)	/	/	✔
INT8 PyTorch Model (static)	/	/	✔
INT8 PyTorch Model (qat)	/	/	✔

Examples

Export to FP32 ONNX Model

If export_to_onnx is called before quantization, we will fetch the FP32 model and export it into a ONNX model.

trainer.export_to_onnx(
    save_path=None, 
    [opset_version=14,]
    [do_constant_folding=True,]
    [verbose=True,]
)

Export to BF16 ONNX Model

If the flag: enable_bf16 is True, you will get an ONNX model with BFloat16 weights for [’MatMul’, ‘Gemm’] node type. This FP32 + BF16 ONNX model can be accelerated by our executor backend.

API usage

trainer.enable_bf16 = True
trainer.export_to_onnx(
    save_path=None, 
    [opset_version=14,]
    [do_constant_folding=True,]
    [verbose=True,]
)

Export to INT8 ONNX Model

If export_to_onnx is called after quantization, we will fetch the FP32 PyTorch model, convert it into ONNX model and do onnxruntime quantization based on pytorch quantization configuration.

trainer.export_to_onnx(
    save_path=None,
    [quant_format='QDQ'/'Qlinear',]
    [dtype='S8S8'/'U8S8'/'U8U8',]
    [opset_version=14,]
)

For executor backend

Our executor backend provides highly optimized performance for INT8 MatMul node type and U8S8 datatype. Therefore, we suggest users to enable the flag enable_executor before export int8 ONNX model for executor backend.

trainer.enable_executor = True