Skip to content

Intel® Extension for Pytorch*🔗

Intel® Extension for PyTorch* extends PyTorch* with up-to-date feature optimizations for an extra performance boost on Intel hardware.

On Intel CPUs optimizations take advantage of the following instuction sets:

  • Intel® Advanced Matrix Extensions (Intel® AMX)
  • Intel® Advanced Vector Extensions 512 (Intel® AVX-512)
  • Vector Neural Network Instructions (VNNI)

On Intel GPUs Intel® Extension for PyTorch* provides easy GPU acceleration through the PyTorch* xpu device. The following Intel GPUs are supported:

Images available here start with the Ubuntu* 22.04 base image with Intel® Extension for PyTorch* built for different use cases as well as some additional software. The Python Dockerfile is used to generate The images below at https://github.com/intel/ai-containers.

Note: There are two dockerhub repositories (intel/intel-extension-for-pytorch and intel/intel-optimized-pytorch) that are routinely updated with the latest images, however, some legacy images have not be published to both repositories.

XPU images🔗

The images below include support for both CPU and GPU optimizations:

Tag(s) Pytorch IPEX Driver Dockerfile
2.3.110-xpu-pip-base,2.3.110-xpu v2.3.1 v2.3.110+xpu 950 v0.4.0-Beta
2.1.40-xpu-pip-base,2.1.40-xpu v2.1.0 v2.1.40+xpu 914 v0.4.0-Beta
2.1.30-xpu v2.1.0 v2.1.30+xpu 803 v0.4.0-Beta
2.1.20-xpu v2.1.0 v2.1.20+xpu 803 v0.3.4
2.1.10-xpu v2.1.0 v2.1.10+xpu 736 v0.2.3
xpu-flex-2.0.110-xpu v2.0.1 v2.0.110+xpu 647 v0.1.0

docker run -it --rm \
    --device /dev/dri \
    -v /dev/dri/by-path:/dev/dri/by-path \
    --ipc=host \
    intel/intel-extension-for-pytorch:2.3.110-xpu

The images below additionally include Jupyter Notebook server:

Tag(s) Pytorch IPEX Driver Jupyter Port Dockerfile
2.3.110-xpu-pip-jupyter v2.3.1 v2.3.110+xpu 950 8888 v0.4.0-Beta
2.1.40-xpu-pip-jupyter v2.1.0 v2.1.40+xpu 914 8888 v0.4.0-Beta
2.1.20-xpu-pip-jupyter v2.1.0 v2.1.20+xpu 803 8888 v0.3.4
2.1.10-xpu-pip-jupyter v2.1.0 v2.1.10+xpu 736 8888 v0.2.3

Run the XPU Jupyter Container🔗

docker run -it --rm \
    -p 8888:8888 \
    --device /dev/dri \
    -v /dev/dri/by-path:/dev/dri/by-path \
    intel/intel-extension-for-pytorch:2.3.110-xpu-pip-jupyter

After running the command above, copy the URL (something like http://127.0.0.1:$PORT/?token=***) into your browser to access the notebook server.

CPU only images🔗

The images below are built only with CPU optimizations (GPU acceleration support was deliberately excluded):

Tag(s) Pytorch IPEX Dockerfile
2.5.0-pip-base, latest v2.5.0 v2.5.0+cpu v0.4.0-Beta
2.4.0-pip-base v2.4.0 v2.4.0+cpu v0.4.0-Beta
2.3.0-pip-base v2.3.0 v2.3.0+cpu v0.4.0-Beta
2.2.0-pip-base v2.2.0 v2.2.0+cpu v0.3.4
2.1.0-pip-base v2.1.0 v2.1.0+cpu v0.2.3
2.0.0-pip-base v2.0.0 v2.0.0+cpu v0.1.0

Run the CPU Container🔗

docker run -it --rm intel/intel-extension-for-pytorch:latest

The images below additionally include Jupyter Notebook server:

Tag(s) Pytorch IPEX Dockerfile
2.5.0-pip-jupyter v2.5.0 v2.5.0+cpu v0.4.0-Beta
2.4.0-pip-jupyter v2.4.0 v2.4.0+cpu v0.4.0-Beta
2.3.0-pip-jupyter v2.3.0 v2.3.0+cpu v0.4.0-Beta
2.2.0-pip-jupyter v2.2.0 v2.2.0+cpu v0.3.4
2.1.0-pip-jupyter v2.1.0 v2.1.0+cpu v0.2.3
2.0.0-pip-jupyter v2.0.0 v2.0.0+cpu v0.1.0
docker run -it --rm \
    -p 8888:8888 \
    -v $PWD/workspace:/workspace \
    -w /workspace \
    intel/intel-extension-for-pytorch:2.4.0-pip-jupyter

After running the command above, copy the URL (something like http://127.0.0.1:$PORT/?token=***) into your browser to access the notebook server.


The images below additionally include Intel® oneAPI Collective Communications Library (oneCCL) and Neural Compressor (INC):

Tag(s) Pytorch IPEX oneCCL INC Dockerfile
2.4.0-pip-multinode v2.4.0 v2.4.0+cpu v2.4.0 v3.0 v0.4.0-Beta
2.3.0-pip-multinode v2.3.0 v2.3.0+cpu v2.3.0 v2.6 v0.4.0-Beta
2.2.0-pip-multinode v2.2.2 v2.2.0+cpu v2.2.0 v2.6 v0.4.0-Beta
2.1.100-pip-mulitnode v2.1.2 v2.1.100+cpu v2.1.0 v2.6 v0.4.0-Beta
2.0.100-pip-multinode v2.0.1 v2.0.100+cpu v2.0.0 v2.6 v0.4.0-Beta

Note

Passwordless SSH connection is also enabled in the image, but the container does not contain any SSH ID keys. The user needs to mount those keys at /root/.ssh/id_rsa and /etc/ssh/authorized_keys.

Tip

Before mounting any keys, modify the permissions of those files with chmod 600 authorized_keys; chmod 600 id_rsa to grant read access for the default user account.

Setup and Run IPEX Multi-Node Container🔗

Tip

Maintainence, Bug Fixes, and Releases of Intel® Extension for PyTorch* Multi-Node Container for Xeon Processors have ceased development. The last supported version is 2.4.0. For future releases, please use the Intel® Extension for PyTorch* Multi-Node Container for XPU.

Some additional assembly is required to utilize this container with OpenSSH. To perform any kind of DDP (Distributed Data Parallel) execution, containers are assigned the roles of launcher and worker respectively:

SSH Server (Worker)

  1. Authorized Keys : /etc/ssh/authorized_keys

SSH Client (Launcher)

  1. Private User Key : /root/.ssh/id_rsa

To add these files correctly please follow the steps described below.

  1. Setup ID Keys

    You can use the commands provided below to generate the identity keys for OpenSSH.

    ssh-keygen -q -N "" -t rsa -b 4096 -f ./id_rsa
    touch authorized_keys
    cat id_rsa.pub >> authorized_keys
    
  2. Configure the permissions and ownership for all of the files you have created so far

    chmod 600 id_rsa config authorized_keys
    chown root:root id_rsa.pub id_rsa config authorized_keys
    
  3. Create a hostfile for torchrun or ipexrun. (Optional)

    Host host1
        HostName <Hostname of host1>
        IdentitiesOnly yes
        IdentityFile ~/.root/id_rsa
        Port <SSH Port>
    Host host2
        HostName <Hostname of host2>
        IdentitiesOnly yes
        IdentityFile ~/.root/id_rsa
        Port <SSH Port>
    ...
    
  4. Configure Intel® oneAPI Collective Communications Library in your python script

    import oneccl_bindings_for_pytorch
    import os
    
    dist.init_process_group(
        backend="ccl",
        init_method="tcp://127.0.0.1:3022",
        world_size=int(os.environ.get("WORLD_SIZE")),
        rank=int(os.environ.get("RANK")),
    )
    
  5. Now start the workers and execute DDP on the launcher

    1. Worker run command:

      docker run -it --rm \
          --net=host \
          -v $PWD/authorized_keys:/etc/ssh/authorized_keys \
          -v $PWD/tests:/workspace/tests \
          -w /workspace \
          intel/intel-extension-for-pytorch:2.4.0-pip-multinode \
          bash -c '/usr/sbin/sshd -D'
      
    2. Launcher run command:

      docker run -it --rm \
          --net=host \
          -v $PWD/id_rsa:/root/.ssh/id_rsa \
          -v $PWD/tests:/workspace/tests \
          -v $PWD/hostfile:/workspace/hostfile \
          -w /workspace \
          intel/intel-extension-for-pytorch:2.4.0-pip-multinode \
          bash -c 'ipexrun cpu  --nnodes 2 --nprocs-per-node 1 --master-addr 127.0.0.1 --master-port 3022 /workspace/tests/ipex-resnet50.py --ipex --device cpu --backend ccl'
      

Note

Intel® MPI can be configured based on your machine settings. If the above commands do not work for you, see the documentation for how to configure based on your network.

Enable DeepSpeed* optimizations🔗

To enable DeepSpeed* optimizations with Intel® oneAPI Collective Communications Library, add the following to your python script:

import deepspeed

# Rather than dist.init_process_group(), use deepspeed.init_distributed()
deepspeed.init_distributed(backend="ccl")

Additionally, if you have a DeepSpeed* configuration you can use the below command as your launcher to run your script with that configuration:

    docker run -it --rm \
    --net=host \
    -v $PWD/id_rsa:/root/.ssh/id_rsa \
    -v $PWD/tests:/workspace/tests \
    -v $PWD/hostfile:/workspace/hostfile \
    -v $PWD/ds_config.json:/workspace/ds_config.json \
    -w /workspace \
    intel/intel-extension-for-pytorch:2.4.0-pip-multinode \
    bash -c 'deepspeed --launcher IMPI \
    --master_addr 127.0.0.1 --master_port 3022 \
    --deepspeed_config ds_config.json --hostfile /workspace/hostfile \
    /workspace/tests/ipex-resnet50.py --ipex --device cpu --backend ccl --deepspeed'

The image below is an extension of the IPEX Multi-Node Container designed to run Hugging Face Generative AI scripts. The container has the typical installations needed to run and fine tune PyTorch generative text models from Hugging Face. It can be used to run multinode jobs using the same instructions from the IPEX Multi-Node container.

Tag(s) Pytorch IPEX oneCCL HF Transformers Dockerfile
2.4.0-pip-multinode-hf-4.44.0-genai v2.4.0 v2.4.0+cpu v2.4.0 v4.44.0 v0.4.0-Beta

Below is an example that shows single node job with the existing finetune.py script.

# Change into home directory first and run the command
docker run -it \
    -v $PWD/workflows/charts/huggingface-llm/scripts:/workspace/scripts \
    -w /workspace/scripts \
    intel/intel-extension-for-pytorch:2.4.0-pip-multinode-hf-4.44.0-genai \
    bash -c 'python finetune.py <script-args>'

The images below are TorchServe* with CPU Optimizations:

Tag(s) Pytorch IPEX Dockerfile
2.5.0-serving-cpu v2.5.0 v2.5.0+cpu v0.4.0-Beta
2.4.0-serving-cpu v2.4.0 v2.4.0+cpu v0.4.0-Beta
2.3.0-serving-cpu v2.3.0 v2.3.0+cpu v0.4.0-Beta
2.2.0-serving-cpu v2.2.0 v2.2.0+cpu v0.3.4

For more details, follow the procedure in the TorchServe instructions.

The images below are TorchServe* with XPU Optimizations:

Tag(s) Pytorch IPEX Dockerfile
2.3.110-serving-xpu v2.3.1 v2.3.110+xpu v0.4.0-Beta

CPU only images with Intel® Distribution for Python*🔗

The images below are built only with CPU optimizations (GPU acceleration support was deliberately excluded) and include Intel® Distribution for Python*:

Tag(s) Pytorch IPEX Dockerfile
2.5.0-idp-base v2.5.0 v2.5.0+cpu v0.4.0-Beta
2.4.0-idp-base v2.4.0 v2.4.0+cpu v0.4.0-Beta
2.3.0-idp-base v2.3.0 v2.3.0+cpu v0.4.0-Beta
2.2.0-idp-base v2.2.0 v2.2.0+cpu v0.3.4
2.1.0-idp-base v2.1.0 v2.1.0+cpu v0.2.3
2.0.0-idp-base v2.0.0 v2.0.0+cpu v0.1.0

The images below additionally include Jupyter Notebook server:

Tag(s) Pytorch IPEX Dockerfile
2.5.0-idp-jupyter v2.5.0 v2.5.0+cpu v0.4.0-Beta
2.4.0-idp-jupyter v2.4.0 v2.4.0+cpu v0.4.0-Beta
2.3.0-idp-jupyter v2.3.0 v2.3.0+cpu v0.4.0-Beta
2.2.0-idp-jupyter v2.2.0 v2.2.0+cpu v0.3.4
2.1.0-idp-jupyter v2.1.0 v2.1.0+cpu v0.2.3
2.0.0-idp-jupyter v2.0.0 v2.0.0+cpu v0.1.0

The images below additionally include Intel® oneAPI Collective Communications Library (oneCCL) and Neural Compressor (INC):

Tag(s) Pytorch IPEX oneCCL INC Dockerfile
2.4.0-idp-multinode v2.4.0 v2.4.0+cpu v2.4.0 v3.0 v0.4.0-Beta
2.3.0-idp-multinode v2.3.0 v2.3.0+cpu v2.3.0 v2.6 v0.4.0-Beta
2.2.0-idp-multinode v2.2.0 v2.2.0+cpu v2.2.0 v2.4.1 v0.3.4
2.1.0-idp-mulitnode v2.1.0 v2.1.0+cpu v2.1.0 v2.3.1 v0.2.3
2.0.0-idp-multinode v2.0.0 v2.0.0+cpu v2.0.0 v2.1.1 v0.1.0

XPU images with Intel® Distribution for Python*🔗

The images below are built only with CPU and GPU optimizations and include Intel® Distribution for Python*:

Tag(s) Pytorch IPEX Driver Dockerfile
2.3.110-xpu-idp-base v2.3.1 v2.3.110+xpu 950 v0.4.0-Beta
2.1.40-xpu-idp-base v2.1.0 v2.1.40+xpu 914 v0.4.0-Beta
2.1.30-xpu-idp-base v2.1.0 v2.1.30+xpu 803 v0.4.0-Beta
2.1.10-xpu-idp-base v2.1.0 v2.1.10+xpu 736 v0.2.3

The images below additionally include Jupyter Notebook server:

Tag(s) Pytorch IPEX Driver Jupyter Port Dockerfile
2.3.110-xpu-idp-jupyter v2.3.1 v2.3.110+xpu 950 8888 v0.4.0-Beta
2.1.40-xpu-idp-jupyter v2.1.0 v2.1.40+xpu 914 8888 v0.4.0-Beta
2.1.20-xpu-idp-jupyter v2.1.0 v2.1.20+xpu 803 8888 v0.3.4
2.1.10-xpu-idp-jupyter v2.1.0 v2.1.10+xpu 736 8888 v0.2.3

Build from Source🔗

To build the images from source, clone the AI Containers repository, follow the main README.md file to setup your environment, and run the following command:

cd pytorch
docker compose build ipex-base
docker compose run ipex-base

You can find the list of services below for each container in the group:

Service Name Description
ipex-base Base image with Intel® Extension for PyTorch*
jupyter Adds Jupyter Notebook server
multinode Adds Intel® oneAPI Collective Communications Library and INC
xpu Adds Intel GPU Support
xpu-jupyter Adds Jupyter notebook server to GPU image
serving TorchServe*

MLPerf Optimized Workloads🔗

The following images are available for MLPerf-optimized workloads. Instructions are available at 'Get Started with Intel MLPerf'.

Tag(s) Base OS MLPerf Round Target Platform
mlperf-inference-4.1-resnet50 rockylinux:8.7 Inference v4.1 Intel(R) Xeon(R) Platinum 8592+
mlperf-inference-4.1-retinanet ubuntu:22.04 Inference v4.1 Intel(R) Xeon(R) Platinum 8592+
mlperf-inference-4.1-gptj ubuntu:22.04 Inference v4.1 Intel(R) Xeon(R) Platinum 8592+
mlperf-inference-4.1-bert ubuntu:22.04 Inference v4.1 Intel(R) Xeon(R) Platinum 8592+
mlperf-inference-4.1-dlrmv2 rockylinux:8.7 Inference v4.1 Intel(R) Xeon(R) Platinum 8592+
mlperf-inference-4.1-3dunet ubuntu:22.04 Inference v4.1 Intel(R) Xeon(R) Platinum 8592+

License🔗

View the License for the Intel® Extension for PyTorch*.

The images below also contain other software which may be under other licenses (such as Pytorch, Jupyter, Bash, etc. from the base).

It is the image user's responsibility to ensure that any use of The images below comply with any relevant licenses for all software contained within.

* Other names and brands may be claimed as the property of others.