ISA Dynamic Dispatching

This document explains the dynamic kernel dispatch mechanism for Intel® Extension for PyTorch* (Intel® Extension for PyTorch*) based on CPU ISA. It is an extension to the similar mechanism in PyTorch.

Overview

Forked from PyTorch, Intel® Extension for PyTorch* adds additional CPU ISA level support, such as AVX512_VNNI, AVX512_BF16 and AMX.

PyTorch & Intel® Extension for PyTorch* CPU ISA support statement:

DEFAULT AVX2 AVX2_VNNI AVX512 AVX512_VNNI AVX512_BF16 AMX
PyTorch
Intel® Extension for PyTorch* 1.11
Intel® Extension for PyTorch* 1.12

* DEFAULT in Intel® Extension for PyTorch* 1.12 implies AVX2.

CPU ISA build compiler requirement

ISA Level GCC requirement
AVX2 Any
AVX512 GCC 9.2+
AVX512_VNNI GCC 9.2+
AVX512_BF16 GCC 10.3+
AVX2_VNNI GCC 11.2+
AMX GCC 11.2+

* Check with cmake/Modules/FindAVX.cmake for detailed compiler checks.

Select ISA Level

By default, Intel® Extension for PyTorch* dispatches to kernels with the maximum ISA level supported on the underlying CPU hardware. This ISA level can be overridden by an environment variable ATEN_CPU_CAPABILITY (same environment variable as PyTorch). Available values are {avx2, avx512, avx512_vnni, avx512_bf16, amx}. The effective ISA level would be the minimal level between ATEN_CPU_CAPABILITY and the maximum level supported by the hardware.

Example:

$ python -c 'import intel_extension_for_pytorch._C as core;print(core._get_current_isa_level())'
AMX
$ ATEN_CPU_CAPABILITY=avx2 python -c 'import intel_extension_for_pytorch._C as core;print(core._get_current_isa_level())'
AVX2

Note:

core._get_current_isa_level() is an Intel® Extension for PyTorch* internal function used for checking the current effective ISA level. It is used for debugging purpose only and subject to change.

CPU feature check

An addtional CPU feature check tool in the subfolder: tests/cpu/isa

$ cmake .
-- The C compiler identification is GNU 11.2.1
-- The CXX compiler identification is GNU 11.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/rh/gcc-toolset-11/root/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/rh/gcc-toolset-11/root/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: tests/cpu/isa

$ make
[ 33%] Building CXX object CMakeFiles/cpu_features.dir/intel_extension_for_pytorch/csrc/cpu/isa/cpu_feature.cpp.o
[ 66%] Building CXX object CMakeFiles/cpu_features.dir/intel_extension_for_pytorch/csrc/cpu/isa/cpu_feature_main.cpp.o
[100%] Linking CXX executable cpu_features
[100%] Built target cpu_features

$ ./cpu_features
XCR0: 00000000000602e7
os --> avx: true
os --> avx2: true
os --> avx512: true
os --> amx: true
mmx:                    true
sse:                    true
sse2:                   true
sse3:                   true
ssse3:                  true
sse4_1:                 true
sse4_2:                 true
aes_ni:                 true
sha:                    true
xsave:                  true
fma:                    true
f16c:                   true
avx:                    true
avx2:                   true
avx_vnni:                       true
avx512_f:                       true
avx512_cd:                      true
avx512_pf:                      false
avx512_er:                      false
avx512_vl:                      true
avx512_bw:                      true
avx512_dq:                      true
avx512_ifma:                    true
avx512_vbmi:                    true
avx512_vpopcntdq:               true
avx512_4fmaps:                  false
avx512_4vnniw:                  false
avx512_vbmi2:                   true
avx512_vpclmul:                 true
avx512_vnni:                    true
avx512_bitalg:                  true
avx512_fp16:                    true
avx512_bf16:                    true
avx512_vp2intersect:            true
amx_bf16:                       true
amx_tile:                       true
amx_int8:                       true
prefetchw:                      true
prefetchwt1:                    false