ISA Dynamic Dispatching
=======================

This document explains the dynamic kernel dispatch mechanism for Intel® Extension for PyTorch\* (Intel® Extension for PyTorch\*) based on CPU ISA. It is an extension to the similar mechanism in PyTorch.

## Overview

Forked from PyTorch, Intel® Extension for PyTorch\* adds additional CPU ISA level support, such as `AVX512_VNNI`, `AVX512_BF16` and `AMX`.

PyTorch & Intel® Extension for PyTorch\* CPU ISA support statement:

 | | DEFAULT | AVX2 | AVX2_VNNI | AVX512 | AVX512_VNNI | AVX512_BF16 | AMX |
 | ---- | :----: | :----: | :----: | :----: | :----: | :----: | :----: |
 | PyTorch | ✔ | ✔ | ✘ | ✔ | ✘ | ✘ | ✘ |
 | Intel® Extension for PyTorch\* 1.11 | ✘ | ✔ | ✘ | ✔ | ✘ | ✘ | ✘ |
 | Intel® Extension for PyTorch\* 1.12 | ✘ | ✔ | ✘ | ✔ | ✔ | ✔ | ✔ |

\* `DEFAULT` in Intel® Extension for PyTorch\* 1.12 implies `AVX2`.

### CPU ISA build compiler requirement

 | ISA Level | GCC requirement |
 | ---- | :----: |
 | AVX2 | Any |
 | AVX512 | GCC 9.2+ |
 | AVX512_VNNI | GCC 9.2+ |
 | AVX512_BF16 | GCC 10.3+ |
 | AVX2_VNNI | GCC 11.2+ |
 | AMX | GCC 11.2+ |

\* Check with `cmake/Modules/FindAVX.cmake` for detailed compiler checks.

## Select ISA Level

By default, Intel® Extension for PyTorch\* dispatches to kernels with the maximum ISA level supported on the underlying CPU hardware. This ISA level can be overridden by an environment variable `ATEN_CPU_CAPABILITY` (same environment variable as PyTorch). Available values are {`avx2`, `avx512`, `avx512_vnni`, `avx512_bf16`, `amx`}. The effective ISA level would be the minimal level between `ATEN_CPU_CAPABILITY` and the maximum level supported by the hardware.

### Example:

```bash
$ python -c 'import intel_extension_for_pytorch._C as core;print(core._get_current_isa_level())'
AMX
$ ATEN_CPU_CAPABILITY=avx2 python -c 'import intel_extension_for_pytorch._C as core;print(core._get_current_isa_level())'
AVX2
```
>**Note:**
>
>`core._get_current_isa_level()` is an Intel® Extension for PyTorch\* internal function used for checking the current effective ISA level. It is used for debugging purpose only and subject to change.

## CPU feature check

An addtional CPU feature check tool in the subfolder: `tests/cpu/isa`

```bash
$ cmake .
-- The C compiler identification is GNU 11.2.1
-- The CXX compiler identification is GNU 11.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/rh/gcc-toolset-11/root/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/rh/gcc-toolset-11/root/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: tests/cpu/isa

$ make
[ 33%] Building CXX object CMakeFiles/cpu_features.dir/intel_extension_for_pytorch/csrc/cpu/isa/cpu_feature.cpp.o
[ 66%] Building CXX object CMakeFiles/cpu_features.dir/intel_extension_for_pytorch/csrc/cpu/isa/cpu_feature_main.cpp.o
[100%] Linking CXX executable cpu_features
[100%] Built target cpu_features

$ ./cpu_features
XCR0: 00000000000602e7
os --> avx: true
os --> avx2: true
os --> avx512: true
os --> amx: true
mmx:                    true
sse:                    true
sse2:                   true
sse3:                   true
ssse3:                  true
sse4_1:                 true
sse4_2:                 true
aes_ni:                 true
sha:                    true
xsave:                  true
fma:                    true
f16c:                   true
avx:                    true
avx2:                   true
avx_vnni:                       true
avx512_f:                       true
avx512_cd:                      true
avx512_pf:                      false
avx512_er:                      false
avx512_vl:                      true
avx512_bw:                      true
avx512_dq:                      true
avx512_ifma:                    true
avx512_vbmi:                    true
avx512_vpopcntdq:               true
avx512_4fmaps:                  false
avx512_4vnniw:                  false
avx512_vbmi2:                   true
avx512_vpclmul:                 true
avx512_vnni:                    true
avx512_bitalg:                  true
avx512_fp16:                    true
avx512_bf16:                    true
avx512_vp2intersect:            true
amx_bf16:                       true
amx_tile:                       true
amx_int8:                       true
prefetchw:                      true
prefetchwt1:                    false
```