ipex.optimize
Frontend API
The ipex.optimize
API is designed to optimize PyTorch* modules (nn.modules
) and specific optimizers within Python modules. Its optimization options for Intel® GPU device include:
Automatic Channels Last
Fusing Convolutional Layers with Batch Normalization
Fusing Linear Layers with Batch Normalization
Replacing Dropout with Identity
Splitting Master Weights
Fusing Optimizer Update Step
The original python modules will be replaced to optimized versions automatically during model execution, if ipex.optimize
is called in the model running script.
The following sections provide detailed descriptions for each optimization flag supported by XPU models on Intel® GPU. For CPU-specific flags, please refer to the API Docs page.
Automatic Channels Last
By default, ipex.optimize
checks if current running GPU platform supports 2D Block Array Load or not. If it does, the Conv*d
and ConvTranspose*d
modules inside the model will be optimized for using channels last memory format. Use ipex.enable_auto_channels_last
or ipex.disable_auto_channels_last
before calling ipex.optimize
to enable or disable this feature manually.
conv_bn_folding
This flag is applicable for model inference. Intel® Extension for PyTorch* tries to match all connected nn.Conv(1/2/3)d
and nn.BatchNorm(1/2/3)d
layers with matching dimensions in the model and fuses them to improve performance. If the fusion fails, the optimization process will be ended and the model will be executed automatically in normal path.
linear_bn_folding
This flag is applicable for model inference. Intel® Extension for PyTorch* tries to match all connected nn.Linear
and nn.BatchNorm(1/2/3)d
layers in the model and fuse them to improve performance. If the fusion fails, the optimization process will be ended and the model will be executed automatically in normal path.
replace_dropout_with_identity
This flag is applicable for model inference. All instances of torch.nn.Dropout
will be replaced with torch.nn.Identity
. The Identity
modules will be ignored during the static graph generation. This optimization could potentially create additional fusion opportunities for the generated graph.
split_master_weight_for_bf16
This flag is applicable for model training. The optimization will be enabled once the following requirements are met:
When calling
ipex.optimize
, thedtype
flag must be set totorch.bfloat16
.fuse_update_step
must be enabled.
The optimization process is as follows:
Wrap all parameters of this model with
ParameterWrapper
.Convert the parameters that meet the condition specified by
ipex.nn.utils._parameter_wrapper.can_cast_training
. This includes the original dtypetorch.float
, and module types defined inipex.nn.utils._parameter_wrapper.IPEX_WEIGHT_CONVERT_MODULE_XPU
.Convert the parameters wrapped by
ParameterWrapper
to the user-specified dtype. If split master weight is needed, the optimizer can only be SGD. The original parameters will be divided into top and bottom parts. The top part will be used for forward and backward computation. When updating weights, both the top and bottom parts will be updated simultaneously.
fuse_update_step
This flag is used to specify whether to replace the original optimizer step with a fused step for better performance. The supported optimizers can be referenced from IPEX_FUSED_OPTIMIZER_LIST_XPU
in ipex.optim._optimizer_utils
. During the optimization, the original step is saved as optimizer._original_step
, optimizer.step
is replaced with a SYCL-written kernel, and the optimizer.fused
parameter is set to True
.