Features

Ease-of-use Python API

Intel® Extension for PyTorch* provides simple frontend Python APIs and utilities for users to get performance optimizations such as graph optimization and operator optimization with minor code changes. Typically, only two to three clauses are required to be added to the original code.

Please check API Documentation page for details of API functions. Examples are available in Examples page.

Note

Please check the following table for package name of Intel® Extension for PyTorch* from version to version when you do the package importing in Python scripts.

version	package name
1.2.0 ~ 1.9.0	intel_pytorch_extension
1.10.0	intel_extension_for_pytorch

Channels Last

Comparing to the default NCHW memory format, channels_last (NHWC) memory format could further accelerate convolutional neural networks. In Intel® Extension for PyTorch*, NHWC memory format has been enabled for most key CPU operators, though not all of them have been merged to PyTorch master branch yet. They are expected to be fully landed in PyTorch upstream soon.

Check more detailed information for Channels Last.

Auto Mixed Precision (AMP)

Low precision data type BFloat16 has been natively supported on the 3rd Generation Xeon® Scalable Processors (aka Cooper Lake) with AVX512 instruction set and will be supported on the next generation of Intel® Xeon® Scalable Processors with Intel® Advanced Matrix Extensions (Intel® AMX) instruction set with further boosted performance. The support of Auto Mixed Precision (AMP) with BFloat16 for CPU and BFloat16 optimization of operators have been massively enabled in Intel® Extension for PyTorch*, and partially upstreamed to PyTorch master branch. Most of these optimizations will be landed in PyTorch master through PRs that are being submitted and reviewed.

Check more detailed information for Auto Mixed Precision (AMP).

Graph Optimization

To optimize performance further with torchscript, Intel® Extension for PyTorch* supports fusion of frequently used operator patterns, like Conv2D+ReLU, Linear+ReLU, etc. The benefit of the fusions are delivered to users in a transparant fashion.

Check more detailed information for Graph Optimization.

Operator Optimization

Intel® Extension for PyTorch* also optimizes operators and implements several customized operators for performance boost. A few ATen operators are replaced by their optimized counterparts in Intel® Extension for PyTorch* via ATen registration mechanism. Moreover, some customized operators are implemented for several popular topologies. For instance, ROIAlign and NMS are defined in Mask R-CNN. To improve performance of these topologies, Intel® Extension for PyTorch* also optimized these customized operators.

class ipex.nn.FrozenBatchNorm2d(num_features)

BatchNorm2d where the batch statistics and the affine parameters are fixed

Parameters: num_features (int) – $C$ from an expected input of size $(N, C, H, W)$

Shape

Input: $(N, C, H, W)$
Output: $(N, C, H, W)$ (same shape as input)

ipex.nn.functional.interaction(*args)

Get the interaction feature beyond different kinds of features (like gender or hobbies), used in DLRM model.

For now, we only optimized “dot” interaction at DLRM Github repo. Through this, we use the dot product to represent the interaction feature between two features.

For example, if feature 1 is “Man” which is represented by [0.1, 0.2, 0.3], and feature 2 is “Like play football” which is represented by [-0.1, 0.3, 0.2].

The dot interaction feature is ([0.1, 0.2, 0.3] * [-0.1, 0.3, 0.2]^T) = -0.1 + 0.6 + 0.6 = 1.1

Parameters: *args – Multiple tensors which represent different features

Shape

Input: $N * (B, D)$ , where N is the number of different kinds of features, B is the batch size, D is feature size
Output: $(B, D + N * (N - 1) / 2)$

Optimizer Optimization

Optimizers are one of key parts of the training workloads. Intel Extension for PyTorch brings two types of optimizations to optimizers: 1. Operator fusion for the computation in the optimizers. 2. SplitSGD for BF16 training, which reduces the memory footprint of the master weights by half.

Check more detailed information for Split SGD and Optimizer Fusion.

Runtime Extension (Experimental)

Intel® Extension for PyTorch* Runtime Extension provides a couple of PyTorch frontend APIs for users to get finer-grained control of the thread runtime. It provides

Multi-stream inference via the Python frontend module MultiStreamModule.
Spawn asynchronous tasks from both Python and C++ frontend.
Configure core bindings for OpenMP threads from both Python and C++ frontend.

Please note: Intel® Extension for PyTorch* Runtime extension is still in the POC stage. The API is subject to change. More detailed descriptions are available at API Documentation page.

Check more detailed information for Runtime Extension.

INT8 Quantization (Experimental)

Intel® Extension for PyTorch* has built-in quantization recipes to deliver good statistical accuracy for most popular DL workloads including CNN, NLP and recommendation models. The quantized model is then optimized with the oneDNN graph fusion pass to deliver good performance.

Check more detailed information for INT8.