Features ======== Ease-of-use Python API ---------------------- Intel® Extension for PyTorch\* provides simple frontend Python APIs and utilities to get performance optimizations such as operator optimization. Check the `API Documentation <api_doc.html>`_ for details of API functions and `Examples <examples.md>`_ for helpful usage tips. DPC++ Extension --------------- Intel® Extension for PyTorch\* provides C++ APIs to get SYCL queue and configure floating-point math mode. Check the `API Documentation`_ for the details of API functions. `DPC++ Extension <features/DPC++_Extension.md>`_ describes how to write customized DPC++ kernels with a practical example and build it with setuptools and CMake. .. toctree:: :hidden: :maxdepth: 1 features/DPC++_Extension Here are detailed discussions of specific feature topics, summarized in the rest of this document: Channels Last ------------- Compared with the default NCHW memory format, using channels_last (NHWC) memory format can further accelerate convolutional neural networks. In Intel® Extension for PyTorch\*, NHWC memory format has been enabled for most key GPU operators. For more detailed information, check `Channels Last <features/nhwc.md>`_. .. toctree:: :hidden: :maxdepth: 1 features/nhwc Auto Mixed Precision (AMP) -------------------------- The support of Auto Mixed Precision (AMP) with BFloat16 and Float16 optimization of operators has been enabled in Intel® Extension for PyTorch\*. BFloat16 is the default low precision floating data type when AMP is enabled. We suggest use AMP for accelerating convolutional and matmul based neural networks. For more detailed information, check `Auto Mixed Precision (AMP) <features/amp.md>`_. .. toctree:: :hidden: :maxdepth: 1 features/amp Advanced Configuration ---------------------- The default settings for Intel® Extension for PyTorch* are sufficient for most use cases. However, if users want to customize Intel® Extension for PyTorch*, advanced configuration is available at build time and runtime. For more detailed information, check `Advanced Configuration <features/advanced_configuration.md>`_. .. toctree:: :hidden: :maxdepth: 1 features/advanced_configuration Optimizer Optimization ---------------------- Optimizers are a key part of the training workloads. Intel® Extension for PyTorch\* supports operator fusion for computation in the optimizers. For more detailed information, check `Optimizer Fusion <features/optimizer_fusion.md>`_. .. toctree:: :hidden: :maxdepth: 1 features/optimizer_fusion Simple Trace Tool ----------------- Simple Trace is a built-in debugging tool that lets you control printing out the call stack for a piece of code. Once enabled, it can automatically print out verbose messages of called operators in a stack format with indenting to distinguish the context. For more detailed information, check `Simple Trace Tool <features/simple_trace.md>`_. .. toctree:: :hidden: :maxdepth: 1 features/simple_trace