# Export to ONNX 1. [Introduction](#introduction) 2. [Supported Model Export Matrix](#supported-model-export-matrix) 3. [Examples](#examples) 3.1. [Export to FP32 ONNX Model](#export-to-fp32-onnx-model) 3.2. [Export to BF16 ONNX Model](#export-to-bf16-onnx-model) 3.3. [Export to INT8 ONNX Model](#export-to-int8-onnx-model) ## Introduction We support exporting PyTorch models into ONNX models with our well-designed API `trainer.export_to_onnx`. Users can get FP32 (Float precision 32 bit), BF16 (Bfloat 16 bit) and INT8 (Integer 8 bit) ONNX model with the same interface. ## Supported Model Export Matrix | Input Model | Export FP32 | Export BF16 | Export INT8 | | --- | --- | --- | --- | | FP32 PyTorch Model | ✔ | ✔ | / | | INT8 PyTorch Model
(dynamic) | / | / | ✔ | | INT8 PyTorch Model
(static) | / | / | ✔ | | INT8 PyTorch Model
(qat) | / | / | ✔ | ## Examples ### Export to FP32 ONNX Model If `export_to_onnx` is called before quantization, we will fetch the FP32 model and export it into a ONNX model. ```py trainer.export_to_onnx( save_path=None, [opset_version=14,] [do_constant_folding=True,] [verbose=True,] ) ``` ### Export to BF16 ONNX Model If the flag: `enable_bf16` is True, you will get an ONNX model with BFloat16 weights for ['MatMul', 'Gemm'] node type. This FP32 + BF16 ONNX model can be accelerated by our [executor](../intel_extension_for_transformers/transformers/runtime/) backend. ### API usage ```py trainer.enable_bf16 = True trainer.export_to_onnx( save_path=None, [opset_version=14,] [do_constant_folding=True,] [verbose=True,] ) ``` ### Export to INT8 ONNX Model If `export_to_onnx` is called after quantization, we will fetch the FP32 PyTorch model, convert it into ONNX model and do onnxruntime quantization based on pytorch quantization configuration. ```py trainer.export_to_onnx( save_path=None, [quant_format='QDQ'/'Qlinear',] [dtype='S8S8'/'U8S8'/'U8U8',] [opset_version=14,] ) ``` ### **For executor backend** Our executor backend provides highly optimized performance for INT8 `MatMul` node type and `U8S8` datatype. Therefore, we suggest users to enable the flag `enable_executor` before export int8 ONNX model for executor backend. ```py trainer.enable_executor = True ```