ipex.optimize Frontend API

The ipex.optimize API is designed to optimize PyTorch* modules (nn.modules) and specific optimizers within Python modules. Its optimization options for Intel® GPU device include:

  • Automatic Channels Last

  • Fusing Convolutional Layers with Batch Normalization

  • Fusing Linear Layers with Batch Normalization

  • Replacing Dropout with Identity

  • Splitting Master Weights

  • Fusing Optimizer Update Step

The original python modules will be replaced to optimized versions automatically during model execution, if ipex.optimize is called in the model running script.

The following sections provide detailed descriptions for each optimization flag supported by XPU models on Intel® GPU. For CPU-specific flags, please refer to the API Docs page.

Automatic Channels Last

By default, ipex.optimize checks if current running GPU platform supports 2D Block Array Load or not. If it does, the Conv*d and ConvTranspose*d modules inside the model will be optimized for using channels last memory format. Use ipex.enable_auto_channels_last or ipex.disable_auto_channels_last before calling ipex.optimize to enable or disable this feature manually.

conv_bn_folding

This flag is applicable for model inference. Intel® Extension for PyTorch* tries to match all connected nn.Conv(1/2/3)d and nn.BatchNorm(1/2/3)d layers with matching dimensions in the model and fuses them to improve performance. If the fusion fails, the optimization process will be ended and the model will be executed automatically in normal path.

linear_bn_folding

This flag is applicable for model inference. Intel® Extension for PyTorch* tries to match all connected nn.Linear and nn.BatchNorm(1/2/3)d layers in the model and fuse them to improve performance. If the fusion fails, the optimization process will be ended and the model will be executed automatically in normal path.

replace_dropout_with_identity

This flag is applicable for model inference. All instances of torch.nn.Dropout will be replaced with torch.nn.Identity. The Identity modules will be ignored during the static graph generation. This optimization could potentially create additional fusion opportunities for the generated graph.

split_master_weight_for_bf16

This flag is applicable for model training. The optimization will be enabled once the following requirements are met:

  • When calling ipex.optimize, the dtype flag must be set to torch.bfloat16.

  • fuse_update_step must be enabled.

The optimization process is as follows:

  • Wrap all parameters of this model with ParameterWrapper.

  • Convert the parameters that meet the condition specified by ipex.nn.utils._parameter_wrapper.can_cast_training. This includes the original dtype torch.float, and module types defined in ipex.nn.utils._parameter_wrapper.IPEX_WEIGHT_CONVERT_MODULE_XPU.

  • Convert the parameters wrapped by ParameterWrapper to the user-specified dtype. If split master weight is needed, the optimizer can only be SGD. The original parameters will be divided into top and bottom parts. The top part will be used for forward and backward computation. When updating weights, both the top and bottom parts will be updated simultaneously.

fuse_update_step

This flag is used to specify whether to replace the original optimizer step with a fused step for better performance. The supported optimizers can be referenced from IPEX_FUSED_OPTIMIZER_LIST_XPU in ipex.optim._optimizer_utils. During the optimization, the original step is saved as optimizer._original_step, optimizer.step is replaced with a SYCL-written kernel, and the optimizer.fused parameter is set to True.