Basic usage#

For implemented examples, please check the examples folder

Run a single MatMul in the NPU#

from intel_npu_acceleration_library.backend import MatMul
import numpy as np

inC, outC, batch = ... # Define your own values

# Create both inputs
X1 = np.random.uniform(-1, 1, (batch, inC)).astype(np.float16)
X2 = np.random.uniform(-1, 1, (outC, inC)).astype(np.float16)

mm = MatMul(inC, outC, batch, profile=False)

result = mm.run(X1, X2)

Compile a model for the NPU#

If you have pytorch>=2.0.0 installed you can use torch compile to optimize your model for the NPU

import intel_npu_acceleration_library
import torch

# Compile model for the NPU
# model a torch.nn.Module class. Model can be quantized JIT
optimized_model = torch.compile(model, backend="npu")

# Use the model as usual

In windows torch.compile is not supported yet. So you might want to use the explicit function intel_npu_acceleration_library.compile. This is true also if you use a pytorch version < 2.0.0

To do this, you just need to call the compile function with your model and the compiler configuration CompilerConfig to compile and optimize the model for the NPU.

import intel_npu_acceleration_library
from intel_npu_acceleration_library.compiler import CompilerConfig
compiler_conf = CompilerConfig(dtype=torch.int8)
optimized_model = intel_npu_acceleration_library.compile(model, compiler_conf)

# Use the model as usual

To compile and optimize a single layer of a model to be pushed to the NPU as one block, you can set use_to=True in the the compiler configuration CompilerConfig.

import intel_npu_acceleration_library
from intel_npu_acceleration_library.compiler import CompilerConfig
compiler_conf = CompilerConfig(use_to=True, dtype=torch.int8)
optimized_block = intel_npu_acceleration_library.compile(single_block, compiler_conf)

Training (Experimental!)#

It is possible to use Intel® NPU Acceleration Library to train a model. As before you just need to call the compile function, this time with training=True. This allows to use the same training script you use in other device with a very minimal modifications.

import intel_npu_acceleration_library
from intel_npu_acceleration_library.compiler import CompilerConfig
compiler_conf = CompilerConfig(dtype=torch.float32, training=True)
compiled_model = intel_npu_acceleration_library.compile(model, compiler_conf)