intel_npu_acceleration_library.backend package

Contents

intel_npu_acceleration_library.backend package#

Submodules#

intel_npu_acceleration_library.backend.base module#

class intel_npu_acceleration_library.backend.base.BaseNPUBackend(profile: bool | None = False)#

Bases: object

A base class that represent a abstract Matrix-Matrix operation on the NPU.

save(path: str)#

Save the Openvino model.

Parameters:

path (str) – the model save path

saveCompiledModel(path: str)#

Save the compiled model.

Parameters:

path (str) – the compiled model save path

class intel_npu_acceleration_library.backend.base.BaseNPUBackendWithPrefetch(profile: bool)#

Bases: BaseNPUBackend

A base class that represent a abstract Matrix-Matrix operation on the NPU.

Linear type classes employ an algorithm to optimize weights prefetching

add_to_map(wt_hash: str, weights: Iterable[ndarray | Tuple[ndarray, ...]])#

Add an operation parameters to the operation hash:parameter map.

Parameters:
  • wt_hash (str) – operation hash

  • weights (Iterable[Union[np.ndarray, Tuple[np.ndarray, ...]]]) – Operation parameters

create_parameters(weights: Iterable[ndarray | Tuple[ndarray, ...]]) _Pointer#

Create an operation parameter from a list of weights.

Parameters:

weights (Iterable[Union[np.ndarray, Tuple[np.ndarray, ...]]]) – Operation parameters

Raises:
  • RuntimeError – Quantized weights needs to be in int8 format

  • ValueError – Invalid dtype for scale

Returns:

an instance to the Parameters object

Return type:

ctypes._Pointer

load_wt_fn(module, parameters)#

Load asyncronously the parameter into the NPU.

Parameters:
  • module – the NPU backend module

  • parameters – the weights parameter class

prefetchWeights()#

Prefetch next operation weights.

setWeights(wt_hash: str | None, *args: ndarray | Tuple[ndarray, ...]) bool#

Set the operation weights in the NPU.

Parameters:
  • wt_hash (str) – operation hash. If set to None force the load of the weights

  • args (Union[np.ndarray, Tuple[np.ndarray, ...]]) – Variable length weights list. Can be a np array or a tuple of weight, scale in case of quantized tensors

Returns:

Return True if the op parameters are already in the op map

Return type:

bool

intel_npu_acceleration_library.backend.base.adapt_weight(w: ndarray) ndarray#

Adapt the weights to run on the NPU.

Parameters:

w (np.ndarray) – weights array

Returns:

The adapted array

Return type:

np.ndarray

intel_npu_acceleration_library.backend.factory module#

class intel_npu_acceleration_library.backend.factory.NNFactory(profile: bool = False, device: str = 'NPU')#

Bases: BaseNPUBackendWithPrefetch

Linear class, computing a matrix matrix multiplication with weights prefetching.

avg_pooling(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

compile()#

Finalize and compile a model.

concat(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

constant(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

convolution(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

get_backend_dtype(dtype) c_char_p#

Get the string representation of the dtype.

Parameters:

dtype – numpy dtype

Raises:

RuntimeError – Unsupported datatype

Returns:

string representation of the dtype

Return type:

ctypes.c_char_p

get_tensor_dtype(node)#

Get tensor dtype.

Parameters:

node – network node

Raises:

RuntimeError – Unsupported dtype

Returns:

tensor dtype

Return type:

str

get_tensor_recursively(args: Sequence[Any]) List[ndarray]#

Get tensor recursively for a list of arguments.

Parameters:

args (Sequence[Any]) – Sequence of tensors, tuple of tensors and additional arguments

Returns:

Sequence of tensors

Return type:

List[np.ndarray]

get_tensor_shape(node)#

Get tensor shape.

Parameters:

node – network node

Returns:

tensor shape

Return type:

tuple[int]

linear(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

matmul(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

max_pooling(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

normL2(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

parameter(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

power(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reduce_max(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reduce_mean(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reduce_min(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reduce_prod(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reduce_sum(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reshape(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

return_tensor() F#

Wrap the output of a function in a Tensor object.

Parameters:

fn (function) – Function

Returns:

A function that wraps the output in a Tensor object

Return type:

function

run(X: ndarray, *weights: ndarray | Tuple[ndarray, ndarray], **kwargs: Any) ndarray#

Run the layer: X * W^T.

Parameters:
  • X (np.ndarray) – lhs operator

  • weights (Union[np.ndarray, Tuple[np.ndarray, np.ndarray]]) – rhs operators

  • kwargs (Any) – additional arguments

Returns:

result

Return type:

np.ndarray

set_input_tensor(tensor: ndarray, idx: int)#

Set input tensor.

Parameters:
  • tensor (np.ndarray) – Input tensor

  • idx (int) – tensor index

slice(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

to(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

transpose(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

unsqueeze(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

intel_npu_acceleration_library.backend.linear module#

class intel_npu_acceleration_library.backend.linear.Linear(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU')#

Bases: NNFactory

Linear class, computing a matrix matrix multiplication with weights prefetching.

run(X: ndarray, W: ndarray, op_id: str) ndarray#

Run the layer: X * W^T.

Parameters:
  • X (np.ndarray) – lhs operator

  • W (np.ndarray) – rhs operator

  • op_id (str) – operation id

Raises:

RuntimeError – Input or weight tensor shape mismatch

Returns:

result

Return type:

np.ndarray

intel_npu_acceleration_library.backend.matmul module#

class intel_npu_acceleration_library.backend.matmul.MatMul(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU')#

Bases: NNFactory

MatMul class, computing a matrix matrix multiplication.

run(X: ndarray, W: ndarray) ndarray#

Run the layer: X * W^T.

Parameters:
  • X (np.ndarray) – lhs operator

  • W (np.ndarray) – rhs operator

Raises:

RuntimeError – Input or weight tensor shape mismatch

Returns:

result

Return type:

np.ndarray

intel_npu_acceleration_library.backend.mlp module#

class intel_npu_acceleration_library.backend.mlp.MLP(input_shape: Sequence[int], intermediate_size: int, activation: str = 'swiglu', bias: bool | None = False, profile: bool = False, device: str = 'NPU', **additional_args)#

Bases: NNFactory

Linear class, computing a matrix matrix multiplication with weights prefetching.

intel_npu_acceleration_library.backend.qlinear module#

class intel_npu_acceleration_library.backend.qlinear.QLinear(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU', dtype: ~numpy.dtype = <class 'numpy.int8'>)#

Bases: NNFactory

Quantized Linear class, computing a matrix matrix multiplication with weights prefetching.

run(X: ndarray, W: ndarray, scale: ndarray, op_id: str) ndarray#

Run the layer: $X * (W * S)^T$ .

Parameters:
  • X (np.ndarray) – activation

  • W (np.ndarray) – quantized weights

  • scale (np.ndarray) – quantization scale

  • op_id (str) – operation id

Raises:

RuntimeError – Input, weights or scale shape mismatch

Returns:

result

Return type:

np.ndarray

intel_npu_acceleration_library.backend.qmatmul module#

class intel_npu_acceleration_library.backend.qmatmul.QMatMul(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU', dtype: ~numpy.dtype = <class 'numpy.int8'>)#

Bases: NNFactory

Quantized Linear class, computing a matrix matrix multiplication.

run(X: ndarray, W: ndarray, scale: ndarray) ndarray#

Run the layer: X * (W * S)^T.

Parameters:
  • X (np.ndarray) – activation

  • W (np.ndarray) – quantized weights

  • scale (np.ndarray) – quantization scale

Raises:

RuntimeError – Input, weights or scale shape mismatch

Returns:

result

Return type:

np.ndarray

intel_npu_acceleration_library.backend.runtime module#

intel_npu_acceleration_library.backend.runtime.adapt_output_tensor(output: ndarray, original_shape: Size, input_dtype: dtype) Tensor#

Adapt the output tensor to the original shape and dtype.

Parameters:
  • output (np.ndarray) – output tensor

  • original_shape (torch.Size) – original shape

  • input_dtype (torch.dtype) – input dtype

Returns:

output tensor

Return type:

torch.Tensor

intel_npu_acceleration_library.backend.runtime.clear_cache()#

Clear the cache of models.

intel_npu_acceleration_library.backend.runtime.run_factory(x: Tensor | List[Tensor], weights: List[Tensor], backend_cls: Any, op_id: str | None = None) Tensor#

Run a factory operation. Depending on the datatype of the weights it runs a float or quantized operation.

Parameters:
  • x (Union[torch.Tensor, List[torch.Tensor]]) – Activation tensor(s). Its dtype must be torch.float16

  • weights (torch.Tensor) – Weights tensor. Its dtype can be torch.float16 or torch.int8

  • backend_cls (Any) – Backend class to run

  • op_id (Optional[str], optional) – Operation ID. Defaults to None.

Returns:

result

Return type:

torch.Tensor

intel_npu_acceleration_library.backend.runtime.run_matmul(x: Tensor, weights: Tensor, scale: Tensor | None = None, op_id: str | None = None) Tensor#

Run a matmul operation. Depending on the datatype of the weights it runs a float or quantized operation.

Parameters:
  • x (torch.Tensor) – Activation tensor. Its dtype must be torch.float16

  • weights (torch.Tensor) – Weights tensor. Its dtype can be torch.float16 or torch.int8

  • scale (Optional[torch.Tensor], optional) – Quantization scale. If weights.dtype == torch.int8 then it must be set. Defaults to None.

  • op_id (Optional[str], optional) – Operation ID. Defaults to None.

Raises:

RuntimeError – Unsupported weights datatype. Supported types: [torch.float16, torch.int8]

Returns:

result

Return type:

torch.Tensor

intel_npu_acceleration_library.backend.runtime.set_contiguous(tensor: Tensor) Tensor#

Set tensor to be contiguous in memory.

Parameters:

tensor (torch.Tensor) – input tensor

Returns:

output, contiguous tensor

Return type:

torch.Tensor

Module contents#

class intel_npu_acceleration_library.backend.Convolution(input_shape: Sequence[int], weights_shape: Sequence[int], bias: bool = False, strides: int | Sequence[int] = 1, padding: int | Sequence[int] = 0, dilation: int | Sequence[int] = 1, groups: int = 1, profile: bool = False, device: str = 'NPU')#

Bases: NNFactory

Linear class, computing a matrix matrix multiplication with weights prefetching.

class intel_npu_acceleration_library.backend.Linear(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU')#

Bases: NNFactory

Linear class, computing a matrix matrix multiplication with weights prefetching.

run(X: ndarray, W: ndarray, op_id: str) ndarray#

Run the layer: X * W^T.

Parameters:
  • X (np.ndarray) – lhs operator

  • W (np.ndarray) – rhs operator

  • op_id (str) – operation id

Raises:

RuntimeError – Input or weight tensor shape mismatch

Returns:

result

Return type:

np.ndarray

class intel_npu_acceleration_library.backend.MLP(input_shape: Sequence[int], intermediate_size: int, activation: str = 'swiglu', bias: bool | None = False, profile: bool = False, device: str = 'NPU', **additional_args)#

Bases: NNFactory

Linear class, computing a matrix matrix multiplication with weights prefetching.

class intel_npu_acceleration_library.backend.MatMul(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU')#

Bases: NNFactory

MatMul class, computing a matrix matrix multiplication.

run(X: ndarray, W: ndarray) ndarray#

Run the layer: X * W^T.

Parameters:
  • X (np.ndarray) – lhs operator

  • W (np.ndarray) – rhs operator

Raises:

RuntimeError – Input or weight tensor shape mismatch

Returns:

result

Return type:

np.ndarray

class intel_npu_acceleration_library.backend.NNFactory(profile: bool = False, device: str = 'NPU')#

Bases: BaseNPUBackendWithPrefetch

Linear class, computing a matrix matrix multiplication with weights prefetching.

avg_pooling(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

compile()#

Finalize and compile a model.

concat(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

constant(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

convolution(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

get_backend_dtype(dtype) c_char_p#

Get the string representation of the dtype.

Parameters:

dtype – numpy dtype

Raises:

RuntimeError – Unsupported datatype

Returns:

string representation of the dtype

Return type:

ctypes.c_char_p

get_tensor_dtype(node)#

Get tensor dtype.

Parameters:

node – network node

Raises:

RuntimeError – Unsupported dtype

Returns:

tensor dtype

Return type:

str

get_tensor_recursively(args: Sequence[Any]) List[ndarray]#

Get tensor recursively for a list of arguments.

Parameters:

args (Sequence[Any]) – Sequence of tensors, tuple of tensors and additional arguments

Returns:

Sequence of tensors

Return type:

List[np.ndarray]

get_tensor_shape(node)#

Get tensor shape.

Parameters:

node – network node

Returns:

tensor shape

Return type:

tuple[int]

linear(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

matmul(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

max_pooling(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

normL2(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

parameter(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

power(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reduce_max(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reduce_mean(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reduce_min(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reduce_prod(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reduce_sum(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reshape(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

return_tensor() F#

Wrap the output of a function in a Tensor object.

Parameters:

fn (function) – Function

Returns:

A function that wraps the output in a Tensor object

Return type:

function

run(X: ndarray, *weights: ndarray | Tuple[ndarray, ndarray], **kwargs: Any) ndarray#

Run the layer: X * W^T.

Parameters:
  • X (np.ndarray) – lhs operator

  • weights (Union[np.ndarray, Tuple[np.ndarray, np.ndarray]]) – rhs operators

  • kwargs (Any) – additional arguments

Returns:

result

Return type:

np.ndarray

set_input_tensor(tensor: ndarray, idx: int)#

Set input tensor.

Parameters:
  • tensor (np.ndarray) – Input tensor

  • idx (int) – tensor index

slice(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

to(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

transpose(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

unsqueeze(*args: Any, **kwargs: Any) Tensor#

Wrap the output of a function in a Tensor object.

Parameters:
  • args (Any) – Variable length argument list

  • kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

class intel_npu_acceleration_library.backend.QLinear(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU', dtype: ~numpy.dtype = <class 'numpy.int8'>)#

Bases: NNFactory

Quantized Linear class, computing a matrix matrix multiplication with weights prefetching.

run(X: ndarray, W: ndarray, scale: ndarray, op_id: str) ndarray#

Run the layer: $X * (W * S)^T$ .

Parameters:
  • X (np.ndarray) – activation

  • W (np.ndarray) – quantized weights

  • scale (np.ndarray) – quantization scale

  • op_id (str) – operation id

Raises:

RuntimeError – Input, weights or scale shape mismatch

Returns:

result

Return type:

np.ndarray

class intel_npu_acceleration_library.backend.QMatMul(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU', dtype: ~numpy.dtype = <class 'numpy.int8'>)#

Bases: NNFactory

Quantized Linear class, computing a matrix matrix multiplication.

run(X: ndarray, W: ndarray, scale: ndarray) ndarray#

Run the layer: X * (W * S)^T.

Parameters:
  • X (np.ndarray) – activation

  • W (np.ndarray) – quantized weights

  • scale (np.ndarray) – quantization scale

Raises:

RuntimeError – Input, weights or scale shape mismatch

Returns:

result

Return type:

np.ndarray

class intel_npu_acceleration_library.backend.SDPA(query_shapes: Tuple[int, int], key_shapes: Tuple[int, int], value_shapes: Tuple[int, int], mask_shapes: Tuple[int, int], is_causal: bool = False, profile: bool = False, device: str = 'NPU')#

Bases: NNFactory

Implementation of a ScaledDotProductAttention NPU operation.

run(query: ndarray, key: ndarray, value: ndarray, mask: ndarray) ndarray#

Run the scaled dot product attention kernel.

Parameters:
  • query (np.ndarray) – sdpa query tensor

  • key (np.ndarray) – sdpa key tensor

  • value (np.ndarray) – sdpa value tensor

  • mask (np.ndarray) – sdpa mask tensor

Returns:

result

Return type:

np.ndarray

class intel_npu_acceleration_library.backend.SimpleSDPA(query_shapes: Tuple[int, int], key_shapes: Tuple[int, int], value_shapes: Tuple[int, int], is_causal: bool = False, profile: bool = False, device: str = 'NPU')#

Bases: NNFactory

Implementation of a ScaledDotProductAttention NPU operation.

run(query: ndarray, key: ndarray, value: ndarray) ndarray#

Run the scaled dot product attention kernel.

Parameters:
  • query (np.ndarray) – sdpa query tensor

  • key (np.ndarray) – sdpa key tensor

  • value (np.ndarray) – sdpa value tensor

Returns:

result

Return type:

np.ndarray

class intel_npu_acceleration_library.backend.Tensor(factory: NNFactory, node: _Pointer)#

Bases: object

Represents a tensor object.

Attrs:

factory (NNFactory): The factory object used to create the tensor. node (ctypes._Pointer): The pointer to the underlying tensor node. shape (Sequence[int]): The shape of the tensor. dtype (NPUDtype): The data type of the tensor. T (Tensor): The transpose of the tensor.

__add__(self, other)#

Adds two tensors element-wise.

__sub__(self, other)#

Subtracts two tensors element-wise.

__mul__(self, other)#

Multiplies two tensors element-wise.

__truediv__(self, other)#

Divides two tensors element-wise.

__neg__(self)#

Negates the tensor.

__repr__(self)#

Returns a string representation of the tensor.

__str__(self)#

Returns a string representation of the tensor.

__len__(self)#

Returns the total number of elements in the tensor.

T()#

Returns the transpose of the tensor.

squeeze(self)#

Removes dimensions of size 1 from the tensor.

unsqueeze(self, axis)#

Adds a dimension of size 1 to the tensor.

__matmul__(self, other)#

Performs matrix multiplication between two tensors.

acos(self)#

Applies acos function to the tensor.

asin(self)#

Applies asin function to the tensor.

atan(self)#

Applies atan function to the tensor.

acosh(self)#

Applies acosh function to the tensor.

asinh(self)#

Applies asinh function to the tensor.

atanh(self)#

Applies atanh function to the tensor.

cosh(self)#

Applies cosh function to the tensor.

sinh(self)#

Applies sinh function to the tensor.

tanh(self)#

Applies tanh function to the tensor.

cos(self)#

Applies cos function to the tensor.

sin(self)#

Applies sin function to the tensor.

tan(self)#

Applies tan function to the tensor.

ceiling(self)#

Applies ceil function to the tensor.

clamp(self, min, max)#

Applies clamp function to the tensor.

elu(self, alpha)#

Applies elu function to the tensor.

erf(self)#

Applies erf function to the tensor.

exp(self)#

Applies exponental function to the tensor.

floor(self)#

Applies floor function to the tensor.

grn(self, bias)#

Applies grn function to the tensor.

hsigmoid(self)#

Applies hsigmoid function to the tensor.

hswish(self)#

Applies hswish function to the tensor.

log(self)#

Applies log function to the tensor.

mish(self)#

Applies mish function to the tensor.

relu(self, bias)#

Applies relu function to the tensor.

round(self)#

Applies round function to the tensor.

sigmoid(self)#

Applies sigmoid function to the tensor.

sign(self)#

Applies sign function to the tensor.

softmax(self, dim)#

Applies softmax function to the tensor.

softplus(self)#

Applies softplus function to the tensor.

sqrt(self)#

Applies sqrt function to the tensor.

max(self, dim, keep_dims)#

Returns the reduced max tensor.

mean(self, dim, keep_dims, dtype)#

Returns the reduced mean tensor.

min(self, dim, keep_dims)#

Returns the reduced min tensor.

prod(self, dim, keep_dims, dtype)#

Returns the reduced product tensor.

sum(self, dim, keep_dims, dtype)#

Returns the reduced sum tensor.

property T: Tensor#

Return the transpose of the tensor.

Returns:

The transposed tensor.

Return type:

Tensor

acos() Tensor#

Apply the acos function to the tensor.

Returns:

The result of applying the acos function.

Return type:

Tensor

acosh() Tensor#

Apply the acosh function to the tensor.

Returns:

The result of applying the acosh function.

Return type:

Tensor

asin() Tensor#

Apply the asin function to the tensor.

Returns:

The result of applying the asin function.

Return type:

Tensor

asinh() Tensor#

Apply the asinh function to the tensor.

Returns:

The result of applying the asinh function.

Return type:

Tensor

atan() Tensor#

Apply the atan function to the tensor.

Returns:

The result of applying the atan function.

Return type:

Tensor

atanh() Tensor#

Apply the atanh function to the tensor.

Returns:

The result of applying the atanh function.

Return type:

Tensor

ceiling() Tensor#

Apply the ceiling function to the tensor.

Returns:

The result of applying the ceiling function.

Return type:

Tensor

chunk(chunks: int, dim: int = 0) Tensor | list#

Return the list of tensor chunks.

Parameters:
  • chunks (int) – The number of chunks to return.

  • dim (int) – The dimension along which to split the tensor. Default is 0.

Returns:

The resulting list of split tensors or a single tensor.

Return type:

Union[“Tensor”, list]

Raises:

ValueError – The input chunks value is not valid.

clamp(min=None, max=None) Tensor#

Apply the clamp function to the tensor.

Parameters:
  • min (int, float) – The lower-bound of the range to be clamped

  • max (int, float) – The upper-bound of the range to be clamped

Returns:

The result of applying the ceil function.

Return type:

Tensor

cos() Tensor#

Apply the cos function to the tensor.

Returns:

The result of applying the cos function.

Return type:

Tensor

cosh() Tensor#

Apply the cosh function to the tensor.

Returns:

The result of applying the cosh function.

Return type:

Tensor

dim() int#

Return the number of dimensions of the tensor.

Returns:

The number of dimensions of the tensor.

Return type:

int

property dtype: NPUDtype#

Returns the data type of the tensor.

Returns:

The data type of the tensor.

Return type:

type

elu(alpha: float = 1.0) Tensor#

Apply the elu function to the tensor.

Parameters:

alpha (float) – The alpha value. Defaults to 1.0.

Returns:

The result of applying the elu function.

Return type:

Tensor

erf() Tensor#

Apply the erf function to the tensor.

Returns:

The result of applying the erf function.

Return type:

Tensor

exp() Tensor#

Apply the exp function to the tensor.

Returns:

The result of applying the exp function.

Return type:

Tensor

factory: NNFactory#
flatten(start_dim=0, end_dim=-1) Tensor#

Flatten the tensor.

Parameters:
  • start_dim (int) – The first dim to flatten. Defaults to 0.

  • end_dim (int) – The last dim to flatten. Defaults to -1.

Returns:

The flattened tensor.

Return type:

Tensor

floor() Tensor#

Apply the floor function to the tensor.

Returns:

The result of applying the floor function.

Return type:

Tensor

grn(bias: float = 1e-12) Tensor#

Apply the grn function to the tensor.

Parameters:

bias (float) – The bias value. Defaults to 1e-12.

Returns:

The result of applying the grn function.

Return type:

Tensor

hsigmoid() Tensor#

Apply the hsigmoid function to the tensor.

Returns:

The result of applying the hsigmoid function.

Return type:

Tensor

hswish() Tensor#

Apply the hswish function to the tensor.

Returns:

The result of applying the hswish function.

Return type:

Tensor

log() Tensor#

Apply the log function to the tensor.

Returns:

The result of applying the log function.

Return type:

Tensor

max(dim: int | None = None, keep_dims: bool | None = False) Tensor#

Return the reduced max tensor.

Parameters:
  • dim (Optional[int], optional) – The dim to reduce. Default is None, and all dimensions are reduced.

  • keep_dims (Optional[bool], optional) – If set to 1 it holds axes that are used for reduction. Defaults to False.

Returns:

The result of max reducing operation.

Return type:

Tensor

mean(dim: int | Sequence[int] | None = None, keep_dims: bool | None = False, dtype: dtype | None = None) Tensor#

Return the reduced mean tensor.

Parameters:
  • dim (Optional[Union[int, Sequence[int]]], optional) – The dim(s) to reduce. Default is None, and all dimensions are reduced.

  • keep_dims (Optional[bool], optional) – If set to 1 it holds axes that are used for reduction. Defaults to False.

  • dtype (Optional[torch.dtype], optional) – The data type. Defaults to None.

Returns:

The result of mean reducing operation.

Return type:

Tensor

min(dim: int | None = None, keep_dims: bool | None = False) Tensor#

Return the reduced min tensor.

Parameters:
  • dim (Optional[int], optional) – The dim to reduce. Default is None, and all dimensions are reduced.

  • keep_dims (Optional[bool], optional) – If set to 1 it holds axes that are used for reduction. Defaults to False.

Returns:

The result of min reducing operation.

Return type:

Tensor

mish() Tensor#

Apply the mish function to the tensor.

Returns:

The result of applying the mish function.

Return type:

Tensor

node: _Pointer#
permute(*input_order: int) Tensor#

Return the transpose of the tensor.

Parameters:

input_order (Sequence[int]) – The order of the dimensions in the transposed tensor.

Returns:

The transposed tensor.

Return type:

Tensor

prod(dim: int | None = None, keep_dims: bool | None = False, dtype: dtype | None = None) Tensor#

Return the reduced product tensor.

Parameters:
  • dim (Optional[int], optional) – The dim to reduce. Default is None, and all dimensions are reduced.

  • keep_dims (Optional[bool], optional) – If set to 1 it holds axes that are used for reduction. Defaults to False.

  • dtype (Optional[torch.dtype], optional) – The data type. Defaults to None.

Returns:

The result of product reducing operation.

Return type:

Tensor

relu() Tensor#

Apply the relu function to the tensor.

Returns:

The result of applying the relu function.

Return type:

Tensor

reshape(*shape: int | Sequence[int]) Tensor#

Return the transpose of the tensor.

Parameters:

shape (Union[int, Sequence[int]]) – The new shape of the tensor.

Returns:

The transposed tensor.

Return type:

Tensor

round() Tensor#

Apply the round function to the tensor.

Returns:

The result of applying the round function.

Return type:

Tensor

property shape: Sequence[int]#

Returns the shape of the tensor.

Returns:

The shape of the tensor.

Return type:

Sequence[int]

sigmoid() Tensor#

Apply the sigmoid function to the tensor.

Returns:

The result of applying the sigmoid function.

Return type:

Tensor

sign() Tensor#

Apply the sign function to the tensor.

Returns:

The result of applying the sign function.

Return type:

Tensor

sin() Tensor#

Apply the sin function to the tensor.

Returns:

The result of applying the sin function.

Return type:

Tensor

sinh() Tensor#

Apply the sinh function to the tensor.

Returns:

The result of applying the sinh function.

Return type:

Tensor

size(dim=None) int | Sequence[int]#

Return the size of the tensor.

Parameters:

dim (int, optional) – The dimension to return the size of. Defaults to None.

Returns:

The size of the tensor.

Return type:

Union[int, Sequence[int]]

softmax(dim) Tensor#

Apply the softmax function to the tensor.

Parameters:

dim (int) – The dimension to apply softmax.

Returns:

The result of applying the softmax function.

Return type:

Tensor

softplus() Tensor#

Apply the softplus function to the tensor.

Returns:

The result of applying the softplus function.

Return type:

Tensor

sqrt() Tensor#

Apply the sqrt function to the tensor.

Returns:

The result of applying the sqrt function.

Return type:

Tensor

squeeze() Tensor#

Remove dimensions of size 1 from the tensor.

Returns:

The squeezed tensor.

Return type:

Tensor

sum(dim: int | Sequence[int] | None = None, keep_dims: bool | None = False, dtype: dtype | None = None) Tensor#

Return the reduced sum tensor.

Parameters:
  • dim (Optional[Union[int, Sequence[int]]], optional) – The dim(s) to reduce. Default is None, and all dimensions are reduced.

  • keep_dims (Optional[bool], optional) – If set to 1 it holds axes that are used for reduction. Defaults to False.

  • dtype (Optional[torch.dtype], optional) – The data type. Defaults to None.

Returns:

The result of sum reducing operation.

Return type:

Tensor

tan() Tensor#

Apply the tan function to the tensor.

Returns:

The result of applying the tan function.

Return type:

Tensor

tanh() Tensor#

Apply the tanh function to the tensor.

Returns:

The result of applying the tanh function.

Return type:

Tensor

to(dtype: NPUDtype) Tensor#

Convert the tensor to the specified data type.

Parameters:

dtype (NPUDtype) – The data type to convert the tensor to.

Returns:

The converted tensor.

Return type:

Tensor

transpose(dim0: int, dim1: int) Tensor#

Return the transpose of the tensor.

Parameters:
  • dim0 (int) – The first dimension to transpose.

  • dim1 (int) – The second dimension to transpose.

Returns:

The transposed tensor.

Return type:

Tensor

type(dtype: NPUDtype) Tensor#

Convert the tensor to the specified data type.

Parameters:

dtype (NPUDtype) – The data type to convert the tensor to.

Returns:

The converted tensor.

Return type:

Tensor

unsqueeze(axis) Tensor#

Add a dimension of size 1 to the tensor.

Parameters:

axis (int) – The axis along which to add the dimension.

Returns:

The unsqueezed tensor.

Return type:

Tensor

view(*shape: Sequence[int] | int) Tensor#

Return the transpose of the tensor.

Parameters:

shape (Union[Sequence[int], int]) – The new shape of the tensor.

Returns:

The transposed tensor.

Return type:

Tensor

intel_npu_acceleration_library.backend.clear_cache()#

Clear the cache of models.

intel_npu_acceleration_library.backend.get_driver_version() int#

Get the driver version for the Intel® NPU Acceleration Library.

Raises:

RuntimeError – an error is raised if the platform is not supported. Currently supported platforms are Windows and Linux

Returns:

NPU driver version

Return type:

int

intel_npu_acceleration_library.backend.npu_available() bool#

Return if the NPU is available.

Returns:

Return True if the NPU is available in the system

Return type:

bool

intel_npu_acceleration_library.backend.run_factory(x: Tensor | List[Tensor], weights: List[Tensor], backend_cls: Any, op_id: str | None = None) Tensor#

Run a factory operation. Depending on the datatype of the weights it runs a float or quantized operation.

Parameters:
  • x (Union[torch.Tensor, List[torch.Tensor]]) – Activation tensor(s). Its dtype must be torch.float16

  • weights (torch.Tensor) – Weights tensor. Its dtype can be torch.float16 or torch.int8

  • backend_cls (Any) – Backend class to run

  • op_id (Optional[str], optional) – Operation ID. Defaults to None.

Returns:

result

Return type:

torch.Tensor

intel_npu_acceleration_library.backend.run_matmul(x: Tensor, weights: Tensor, scale: Tensor | None = None, op_id: str | None = None) Tensor#

Run a matmul operation. Depending on the datatype of the weights it runs a float or quantized operation.

Parameters:
  • x (torch.Tensor) – Activation tensor. Its dtype must be torch.float16

  • weights (torch.Tensor) – Weights tensor. Its dtype can be torch.float16 or torch.int8

  • scale (Optional[torch.Tensor], optional) – Quantization scale. If weights.dtype == torch.int8 then it must be set. Defaults to None.

  • op_id (Optional[str], optional) – Operation ID. Defaults to None.

Raises:

RuntimeError – Unsupported weights datatype. Supported types: [torch.float16, torch.int8]

Returns:

result

Return type:

torch.Tensor