Technical Details

ISA Dynamic Dispatching [CPU]

Intel® Extension for PyTorch* features dynamic dispatching functionality to automatically adapt execution binaries to the most advanced instruction set available on your machine.

For more detailed information, check ISA Dynamic Dispatching.

Graph Optimization [CPU]

To further optimize TorchScript performance, Intel® Extension for PyTorch* supports transparent fusion of frequently used operator patterns such as Conv2D+ReLU and Linear+ReLU. For more detailed information, check Graph Optimization.

Compared to eager mode, graph mode in PyTorch normally yields better performance from optimization methodologies such as operator fusion. Intel® Extension for PyTorch* provides further optimizations in graph mode. We recommend you take advantage of Intel® Extension for PyTorch* with TorchScript. You may wish to run with the torch.jit.trace() function first, since it generally works better with Intel® Extension for PyTorch* than using the torch.jit.script() function. More detailed information can be found at the pytorch.org website.

Optimizer Optimization [CPU, GPU]

Optimizers are a key part of the training workloads. Intel® Extension for PyTorch* brings two types of optimizations to optimizers:

  1. Operator fusion for the computation in the optimizers. [CPU, GPU]

  2. SplitSGD for BF16 training, which reduces the memory footprint of the master weights by half. [CPU]

For more detailed information, check Optimizer Fusion on CPU, Optimizer Fusion on GPU and Split SGD.

Memory Management [GPU]

Intel® Extension for PyTorch* uses a caching memory allocator to speed up memory allocations. This allows fast memory deallocation without any overhead. Allocations are associated with a sycl device. The allocator attempts to find the smallest cached block that will fit the requested size from the reserved block pool. If it unable to find a appropriate memory block inside of already allocated ares, the allocator will delegate to allocate a new block memory.

For more detailed information, check Memory Management.