intel_npu_acceleration_library.backend package

intel_npu_acceleration_library.backend package#

Submodules#

intel_npu_acceleration_library.backend.base module#

class intel_npu_acceleration_library.backend.base.BaseNPUBackend(profile: bool | None = False)#

Bases: object

A base class that represent a abstract Matrix-Matrix operation on the NPU.

save(path: str)#

Save the Openvino model.

Parameters:: path (str) – the model save path

saveCompiledModel(path: str)#

Save the compiled model.

Parameters:: path (str) – the compiled model save path

class intel_npu_acceleration_library.backend.base.BaseNPUBackendWithPrefetch(profile: bool)#

Bases: BaseNPUBackend

A base class that represent a abstract Matrix-Matrix operation on the NPU.

Linear type classes employ an algorithm to optimize weights prefetching

add_to_map(wt_hash: str, weights: Iterable[ndarray | Tuple[ndarray, ...]])#

Add an operation parameters to the operation hash:parameter map.

Parameters:

wt_hash (str) – operation hash
weights (Iterable[Union[np.ndarray, Tuple[np.ndarray, ...]]]) – Operation parameters

create_parameters(weights: Iterable[ndarray | Tuple[ndarray, ...]]) → _Pointer#

Create an operation parameter from a list of weights.

Parameters:

weights (Iterable[Union[np.ndarray, Tuple[np.ndarray, ...]]]) – Operation parameters

Raises:

RuntimeError – Quantized weights needs to be in int8 format
ValueError – Invalid dtype for scale

Returns:

an instance to the Parameters object

Return type:

ctypes._Pointer

load_wt_fn(module, parameters)#

Load asyncronously the parameter into the NPU.

Parameters:

module – the NPU backend module
parameters – the weights parameter class

prefetchWeights()#: Prefetch next operation weights.

setWeights(wt_hash: str | None, *args: ndarray | Tuple[ndarray, ...]) → bool#

Set the operation weights in the NPU.

Parameters:

wt_hash (str) – operation hash. If set to None force the load of the weights
args (Union[np.ndarray, Tuple[np.ndarray, ...]]) – Variable length weights list. Can be a np array or a tuple of weight, scale in case of quantized tensors

Returns:

Return True if the op parameters are already in the op map

Return type:

bool

intel_npu_acceleration_library.backend.base.adapt_weight(w: ndarray) → ndarray#

Adapt the weights to run on the NPU.

Parameters:: w (np.ndarray) – weights array
Returns:: The adapted array
Return type:: np.ndarray

intel_npu_acceleration_library.backend.factory module#

class intel_npu_acceleration_library.backend.factory.NNFactory(profile: bool = False, device: str = 'NPU')#

Bases: BaseNPUBackendWithPrefetch

Linear class, computing a matrix matrix multiplication with weights prefetching.

avg_pooling(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

compile()#: Finalize and compile a model.

concat(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

constant(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

convolution(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

get_backend_dtype(dtype) → c_char_p#

Get the string representation of the dtype.

Parameters:: dtype – numpy dtype
Raises:: RuntimeError – Unsupported datatype
Returns:: string representation of the dtype
Return type:: ctypes.c_char_p

get_tensor_dtype(node)#

Get tensor dtype.

Parameters:: node – network node
Raises:: RuntimeError – Unsupported dtype
Returns:: tensor dtype
Return type:: str

get_tensor_recursively(args: Sequence[Any]) → List[ndarray]#

Get tensor recursively for a list of arguments.

Parameters:: args (Sequence[Any]) – Sequence of tensors, tuple of tensors and additional arguments
Returns:: Sequence of tensors
Return type:: List[np.ndarray]

get_tensor_shape(node)#

Get tensor shape.

Parameters:: node – network node
Returns:: tensor shape
Return type:: tuple[int]

linear(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

matmul(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

max_pooling(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

normL2(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

parameter(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

power(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reduce_max(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reduce_mean(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reduce_min(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reduce_prod(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reduce_sum(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reshape(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

return_tensor() → F#

Wrap the output of a function in a Tensor object.

Parameters:: fn (function) – Function
Returns:: A function that wraps the output in a Tensor object
Return type:: function

run(X: ndarray, *weights: ndarray | Tuple[ndarray, ndarray], **kwargs: Any) → ndarray#

Run the layer: X * W^T.

Parameters:

X (np.ndarray) – lhs operator
weights (Union[np.ndarray, Tuple[np.ndarray, np.ndarray]]) – rhs operators
kwargs (Any) – additional arguments

Returns:

result

Return type:

np.ndarray

set_input_tensor(tensor: ndarray, idx: int)#

Set input tensor.

Parameters:

tensor (np.ndarray) – Input tensor
idx (int) – tensor index

slice(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

to(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

transpose(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

unsqueeze(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

intel_npu_acceleration_library.backend.linear module#

class intel_npu_acceleration_library.backend.linear.Linear(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU')#

Bases: NNFactory

Linear class, computing a matrix matrix multiplication with weights prefetching.

run(X: ndarray, W: ndarray, op_id: str) → ndarray#

Run the layer: X * W^T.

Parameters:

X (np.ndarray) – lhs operator
W (np.ndarray) – rhs operator
op_id (str) – operation id

Raises:

RuntimeError – Input or weight tensor shape mismatch

Returns:

result

Return type:

np.ndarray

intel_npu_acceleration_library.backend.matmul module#

class intel_npu_acceleration_library.backend.matmul.MatMul(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU')#

Bases: NNFactory

MatMul class, computing a matrix matrix multiplication.

run(X: ndarray, W: ndarray) → ndarray#

Run the layer: X * W^T.

Parameters:

X (np.ndarray) – lhs operator
W (np.ndarray) – rhs operator

Raises:

RuntimeError – Input or weight tensor shape mismatch

Returns:

result

Return type:

np.ndarray

intel_npu_acceleration_library.backend.mlp module#

class intel_npu_acceleration_library.backend.mlp.MLP(input_shape: Sequence[int], intermediate_size: int, activation: str = 'swiglu', bias: bool | None = False, profile: bool = False, device: str = 'NPU', **additional_args)#

Bases: NNFactory

Linear class, computing a matrix matrix multiplication with weights prefetching.

intel_npu_acceleration_library.backend.qlinear module#

class intel_npu_acceleration_library.backend.qlinear.QLinear(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU', dtype: ~numpy.dtype = <class 'numpy.int8'>)#

Bases: NNFactory

Quantized Linear class, computing a matrix matrix multiplication with weights prefetching.

run(X: ndarray, W: ndarray, scale: ndarray, op_id: str) → ndarray#

Run the layer: $X * (W * S)^T$ .

Parameters:

X (np.ndarray) – activation
W (np.ndarray) – quantized weights
scale (np.ndarray) – quantization scale
op_id (str) – operation id

Raises:

RuntimeError – Input, weights or scale shape mismatch

Returns:

result

Return type:

np.ndarray

intel_npu_acceleration_library.backend.qmatmul module#

class intel_npu_acceleration_library.backend.qmatmul.QMatMul(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU', dtype: ~numpy.dtype = <class 'numpy.int8'>)#

Bases: NNFactory

Quantized Linear class, computing a matrix matrix multiplication.

run(X: ndarray, W: ndarray, scale: ndarray) → ndarray#

Run the layer: X * (W * S)^T.

Parameters:

X (np.ndarray) – activation
W (np.ndarray) – quantized weights
scale (np.ndarray) – quantization scale

Raises:

RuntimeError – Input, weights or scale shape mismatch

Returns:

result

Return type:

np.ndarray

intel_npu_acceleration_library.backend.runtime module#

intel_npu_acceleration_library.backend.runtime.adapt_output_tensor(output: ndarray, original_shape: Size, input_dtype: dtype) → Tensor#

Adapt the output tensor to the original shape and dtype.

Parameters:

output (np.ndarray) – output tensor
original_shape (torch.Size) – original shape
input_dtype (torch.dtype) – input dtype

Returns:

output tensor

Return type:

torch.Tensor

intel_npu_acceleration_library.backend.runtime.clear_cache()#: Clear the cache of models.

intel_npu_acceleration_library.backend.runtime.run_factory(x: Tensor | List[Tensor], weights: List[Tensor], backend_cls: Any, op_id: str | None = None) → Tensor#

Run a factory operation. Depending on the datatype of the weights it runs a float or quantized operation.

Parameters:

x (Union[torch.Tensor, List[torch.Tensor]]) – Activation tensor(s). Its dtype must be torch.float16
weights (torch.Tensor) – Weights tensor. Its dtype can be torch.float16 or torch.int8
backend_cls (Any) – Backend class to run
op_id (Optional[str], optional) – Operation ID. Defaults to None.

Returns:

result

Return type:

torch.Tensor

intel_npu_acceleration_library.backend.runtime.run_matmul(x: Tensor, weights: Tensor, scale: Tensor | None = None, op_id: str | None = None) → Tensor#

Run a matmul operation. Depending on the datatype of the weights it runs a float or quantized operation.

Parameters:

x (torch.Tensor) – Activation tensor. Its dtype must be torch.float16
weights (torch.Tensor) – Weights tensor. Its dtype can be torch.float16 or torch.int8
scale (Optional[torch.Tensor], optional) – Quantization scale. If weights.dtype == torch.int8 then it must be set. Defaults to None.
op_id (Optional[str], optional) – Operation ID. Defaults to None.

Raises:

RuntimeError – Unsupported weights datatype. Supported types: [torch.float16, torch.int8]

Returns:

result

Return type:

torch.Tensor

intel_npu_acceleration_library.backend.runtime.set_contiguous(tensor: Tensor) → Tensor#

Set tensor to be contiguous in memory.

Parameters:: tensor (torch.Tensor) – input tensor
Returns:: output, contiguous tensor
Return type:: torch.Tensor

Module contents#

class intel_npu_acceleration_library.backend.Convolution(input_shape: Sequence[int], weights_shape: Sequence[int], bias: bool = False, strides: int | Sequence[int] = 1, padding: int | Sequence[int] = 0, dilation: int | Sequence[int] = 1, groups: int = 1, profile: bool = False, device: str = 'NPU')#

Bases: NNFactory

Linear class, computing a matrix matrix multiplication with weights prefetching.

class intel_npu_acceleration_library.backend.Linear(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU')#

Bases: NNFactory

Linear class, computing a matrix matrix multiplication with weights prefetching.

run(X: ndarray, W: ndarray, op_id: str) → ndarray#

Run the layer: X * W^T.

Parameters:

X (np.ndarray) – lhs operator
W (np.ndarray) – rhs operator
op_id (str) – operation id

Raises:

RuntimeError – Input or weight tensor shape mismatch

Returns:

result

Return type:

np.ndarray

class intel_npu_acceleration_library.backend.MLP(input_shape: Sequence[int], intermediate_size: int, activation: str = 'swiglu', bias: bool | None = False, profile: bool = False, device: str = 'NPU', **additional_args)#

Bases: NNFactory

Linear class, computing a matrix matrix multiplication with weights prefetching.

class intel_npu_acceleration_library.backend.MatMul(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU')#

Bases: NNFactory

MatMul class, computing a matrix matrix multiplication.

run(X: ndarray, W: ndarray) → ndarray#

Run the layer: X * W^T.

Parameters:

X (np.ndarray) – lhs operator
W (np.ndarray) – rhs operator

Raises:

RuntimeError – Input or weight tensor shape mismatch

Returns:

result

Return type:

np.ndarray

class intel_npu_acceleration_library.backend.NNFactory(profile: bool = False, device: str = 'NPU')#

Bases: BaseNPUBackendWithPrefetch

Linear class, computing a matrix matrix multiplication with weights prefetching.

avg_pooling(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

compile()#: Finalize and compile a model.

concat(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

constant(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

convolution(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

get_backend_dtype(dtype) → c_char_p#

Get the string representation of the dtype.

Parameters:: dtype – numpy dtype
Raises:: RuntimeError – Unsupported datatype
Returns:: string representation of the dtype
Return type:: ctypes.c_char_p

get_tensor_dtype(node)#

Get tensor dtype.

Parameters:: node – network node
Raises:: RuntimeError – Unsupported dtype
Returns:: tensor dtype
Return type:: str

get_tensor_recursively(args: Sequence[Any]) → List[ndarray]#

Get tensor recursively for a list of arguments.

Parameters:: args (Sequence[Any]) – Sequence of tensors, tuple of tensors and additional arguments
Returns:: Sequence of tensors
Return type:: List[np.ndarray]

get_tensor_shape(node)#

Get tensor shape.

Parameters:: node – network node
Returns:: tensor shape
Return type:: tuple[int]

linear(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

matmul(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

max_pooling(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

normL2(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

parameter(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

power(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reduce_max(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reduce_mean(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reduce_min(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reduce_prod(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reduce_sum(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

reshape(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

return_tensor() → F#

Wrap the output of a function in a Tensor object.

Parameters:: fn (function) – Function
Returns:: A function that wraps the output in a Tensor object
Return type:: function

run(X: ndarray, *weights: ndarray | Tuple[ndarray, ndarray], **kwargs: Any) → ndarray#

Run the layer: X * W^T.

Parameters:

X (np.ndarray) – lhs operator
weights (Union[np.ndarray, Tuple[np.ndarray, np.ndarray]]) – rhs operators
kwargs (Any) – additional arguments

Returns:

result

Return type:

np.ndarray

set_input_tensor(tensor: ndarray, idx: int)#

Set input tensor.

Parameters:

tensor (np.ndarray) – Input tensor
idx (int) – tensor index

slice(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

to(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

transpose(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

unsqueeze(*args: Any, **kwargs: Any) → Tensor#

Wrap the output of a function in a Tensor object.

Parameters:

args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments

Returns:

Tensor object

Return type:

Tensor

class intel_npu_acceleration_library.backend.QLinear(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU', dtype: ~numpy.dtype = <class 'numpy.int8'>)#

Bases: NNFactory

Quantized Linear class, computing a matrix matrix multiplication with weights prefetching.

run(X: ndarray, W: ndarray, scale: ndarray, op_id: str) → ndarray#

Run the layer: $X * (W * S)^T$ .

Parameters:

X (np.ndarray) – activation
W (np.ndarray) – quantized weights
scale (np.ndarray) – quantization scale
op_id (str) – operation id

Raises:

RuntimeError – Input, weights or scale shape mismatch

Returns:

result

Return type:

np.ndarray

class intel_npu_acceleration_library.backend.QMatMul(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU', dtype: ~numpy.dtype = <class 'numpy.int8'>)#

Bases: NNFactory

Quantized Linear class, computing a matrix matrix multiplication.

run(X: ndarray, W: ndarray, scale: ndarray) → ndarray#

Run the layer: X * (W * S)^T.

Parameters:

X (np.ndarray) – activation
W (np.ndarray) – quantized weights
scale (np.ndarray) – quantization scale

Raises:

RuntimeError – Input, weights or scale shape mismatch

Returns:

result

Return type:

np.ndarray

class intel_npu_acceleration_library.backend.SDPA(query_shapes: Tuple[int, int], key_shapes: Tuple[int, int], value_shapes: Tuple[int, int], mask_shapes: Tuple[int, int], is_causal: bool = False, profile: bool = False, device: str = 'NPU')#

Bases: NNFactory

Implementation of a ScaledDotProductAttention NPU operation.

run(query: ndarray, key: ndarray, value: ndarray, mask: ndarray) → ndarray#

Run the scaled dot product attention kernel.

Parameters:

query (np.ndarray) – sdpa query tensor
key (np.ndarray) – sdpa key tensor
value (np.ndarray) – sdpa value tensor
mask (np.ndarray) – sdpa mask tensor

Returns:

result

Return type:

np.ndarray

class intel_npu_acceleration_library.backend.SimpleSDPA(query_shapes: Tuple[int, int], key_shapes: Tuple[int, int], value_shapes: Tuple[int, int], is_causal: bool = False, profile: bool = False, device: str = 'NPU')#

Bases: NNFactory

Implementation of a ScaledDotProductAttention NPU operation.

run(query: ndarray, key: ndarray, value: ndarray) → ndarray#

Run the scaled dot product attention kernel.

Parameters:

query (np.ndarray) – sdpa query tensor
key (np.ndarray) – sdpa key tensor
value (np.ndarray) – sdpa value tensor

Returns:

result

Return type:

np.ndarray

class intel_npu_acceleration_library.backend.Tensor(factory: NNFactory, node: _Pointer)#

Bases: object

Represents a tensor object.

Attrs:: factory (NNFactory): The factory object used to create the tensor. node (ctypes._Pointer): The pointer to the underlying tensor node. shape (Sequence[int]): The shape of the tensor. dtype (NPUDtype): The data type of the tensor. T (Tensor): The transpose of the tensor.

__add__(self, other)#: Adds two tensors element-wise.

__sub__(self, other)#: Subtracts two tensors element-wise.

__mul__(self, other)#: Multiplies two tensors element-wise.

__truediv__(self, other)#: Divides two tensors element-wise.

__neg__(self)#: Negates the tensor.

__repr__(self)#: Returns a string representation of the tensor.

__str__(self)#: Returns a string representation of the tensor.

__len__(self)#: Returns the total number of elements in the tensor.

T()#: Returns the transpose of the tensor.

squeeze(self)#: Removes dimensions of size 1 from the tensor.

unsqueeze(self, axis)#: Adds a dimension of size 1 to the tensor.

__matmul__(self, other)#: Performs matrix multiplication between two tensors.

acos(self)#: Applies acos function to the tensor.

asin(self)#: Applies asin function to the tensor.

atan(self)#: Applies atan function to the tensor.

acosh(self)#: Applies acosh function to the tensor.

asinh(self)#: Applies asinh function to the tensor.

atanh(self)#: Applies atanh function to the tensor.

cosh(self)#: Applies cosh function to the tensor.

sinh(self)#: Applies sinh function to the tensor.

tanh(self)#: Applies tanh function to the tensor.

cos(self)#: Applies cos function to the tensor.

sin(self)#: Applies sin function to the tensor.

tan(self)#: Applies tan function to the tensor.

ceiling(self)#: Applies ceil function to the tensor.

clamp(self, min, max)#: Applies clamp function to the tensor.

elu(self, alpha)#: Applies elu function to the tensor.

erf(self)#: Applies erf function to the tensor.

exp(self)#: Applies exponental function to the tensor.

floor(self)#: Applies floor function to the tensor.

grn(self, bias)#: Applies grn function to the tensor.

hsigmoid(self)#: Applies hsigmoid function to the tensor.

hswish(self)#: Applies hswish function to the tensor.

log(self)#: Applies log function to the tensor.

mish(self)#: Applies mish function to the tensor.

relu(self, bias)#: Applies relu function to the tensor.

round(self)#: Applies round function to the tensor.

sigmoid(self)#: Applies sigmoid function to the tensor.

sign(self)#: Applies sign function to the tensor.

softmax(self, dim)#: Applies softmax function to the tensor.

softplus(self)#: Applies softplus function to the tensor.

sqrt(self)#: Applies sqrt function to the tensor.

max(self, dim, keep_dims)#: Returns the reduced max tensor.

mean(self, dim, keep_dims, dtype)#: Returns the reduced mean tensor.

min(self, dim, keep_dims)#: Returns the reduced min tensor.

prod(self, dim, keep_dims, dtype)#: Returns the reduced product tensor.

sum(self, dim, keep_dims, dtype)#: Returns the reduced sum tensor.

property T: Tensor#

Return the transpose of the tensor.

Returns:: The transposed tensor.
Return type:: Tensor

acos() → Tensor#

Apply the acos function to the tensor.

Returns:: The result of applying the acos function.
Return type:: Tensor

acosh() → Tensor#

Apply the acosh function to the tensor.

Returns:: The result of applying the acosh function.
Return type:: Tensor

asin() → Tensor#

Apply the asin function to the tensor.

Returns:: The result of applying the asin function.
Return type:: Tensor

asinh() → Tensor#

Apply the asinh function to the tensor.

Returns:: The result of applying the asinh function.
Return type:: Tensor

atan() → Tensor#

Apply the atan function to the tensor.

Returns:: The result of applying the atan function.
Return type:: Tensor

atanh() → Tensor#

Apply the atanh function to the tensor.

Returns:: The result of applying the atanh function.
Return type:: Tensor

ceiling() → Tensor#

Apply the ceiling function to the tensor.

Returns:: The result of applying the ceiling function.
Return type:: Tensor

chunk(chunks: int, dim: int = 0) → Tensor | list#

Return the list of tensor chunks.

Parameters:

chunks (int) – The number of chunks to return.
dim (int) – The dimension along which to split the tensor. Default is 0.

Returns:

The resulting list of split tensors or a single tensor.

Return type:

Union[“Tensor”, list]

Raises:

ValueError – The input chunks value is not valid.

clamp(min=None, max=None) → Tensor#

Apply the clamp function to the tensor.

Parameters:

min (int, float) – The lower-bound of the range to be clamped
max (int, float) – The upper-bound of the range to be clamped

Returns:

The result of applying the ceil function.

Return type:

Tensor

cos() → Tensor#

Apply the cos function to the tensor.

Returns:: The result of applying the cos function.
Return type:: Tensor

cosh() → Tensor#

Apply the cosh function to the tensor.

Returns:: The result of applying the cosh function.
Return type:: Tensor

dim() → int#

Return the number of dimensions of the tensor.

Returns:: The number of dimensions of the tensor.
Return type:: int

property dtype: NPUDtype#

Returns the data type of the tensor.

Returns:: The data type of the tensor.
Return type:: type

elu(alpha: float = 1.0) → Tensor#

Apply the elu function to the tensor.

Parameters:: alpha (float) – The alpha value. Defaults to 1.0.
Returns:: The result of applying the elu function.
Return type:: Tensor

erf() → Tensor#

Apply the erf function to the tensor.

Returns:: The result of applying the erf function.
Return type:: Tensor

exp() → Tensor#

Apply the exp function to the tensor.

Returns:: The result of applying the exp function.
Return type:: Tensor

factory: NNFactory#

flatten(start_dim=0, end_dim=-1) → Tensor#

Flatten the tensor.

Parameters:

start_dim (int) – The first dim to flatten. Defaults to 0.
end_dim (int) – The last dim to flatten. Defaults to -1.

Returns:

The flattened tensor.

Return type:

Tensor

floor() → Tensor#

Apply the floor function to the tensor.

Returns:: The result of applying the floor function.
Return type:: Tensor

grn(bias: float = 1e-12) → Tensor#

Apply the grn function to the tensor.

Parameters:: bias (float) – The bias value. Defaults to 1e-12.
Returns:: The result of applying the grn function.
Return type:: Tensor

hsigmoid() → Tensor#

Apply the hsigmoid function to the tensor.

Returns:: The result of applying the hsigmoid function.
Return type:: Tensor

hswish() → Tensor#

Apply the hswish function to the tensor.

Returns:: The result of applying the hswish function.
Return type:: Tensor

log() → Tensor#

Apply the log function to the tensor.

Returns:: The result of applying the log function.
Return type:: Tensor

max(dim: int | None = None, keep_dims: bool | None = False) → Tensor#

Return the reduced max tensor.

Parameters:

dim (Optional[int], optional) – The dim to reduce. Default is None, and all dimensions are reduced.
keep_dims (Optional[bool], optional) – If set to 1 it holds axes that are used for reduction. Defaults to False.

Returns:

The result of max reducing operation.

Return type:

Tensor

mean(dim: int | Sequence[int] | None = None, keep_dims: bool | None = False, dtype: dtype | None = None) → Tensor#

Return the reduced mean tensor.

Parameters:

dim (Optional[Union[int, Sequence[int]]], optional) – The dim(s) to reduce. Default is None, and all dimensions are reduced.
keep_dims (Optional[bool], optional) – If set to 1 it holds axes that are used for reduction. Defaults to False.
dtype (Optional[torch.dtype], optional) – The data type. Defaults to None.

Returns:

The result of mean reducing operation.

Return type:

Tensor

min(dim: int | None = None, keep_dims: bool | None = False) → Tensor#

Return the reduced min tensor.

Parameters:

dim (Optional[int], optional) – The dim to reduce. Default is None, and all dimensions are reduced.
keep_dims (Optional[bool], optional) – If set to 1 it holds axes that are used for reduction. Defaults to False.

Returns:

The result of min reducing operation.

Return type:

Tensor

mish() → Tensor#

Apply the mish function to the tensor.

Returns:: The result of applying the mish function.
Return type:: Tensor

node: _Pointer#

permute(*input_order: int) → Tensor#

Return the transpose of the tensor.

Parameters:: input_order (Sequence[int]) – The order of the dimensions in the transposed tensor.
Returns:: The transposed tensor.
Return type:: Tensor

prod(dim: int | None = None, keep_dims: bool | None = False, dtype: dtype | None = None) → Tensor#

Return the reduced product tensor.

Parameters:

dim (Optional[int], optional) – The dim to reduce. Default is None, and all dimensions are reduced.
keep_dims (Optional[bool], optional) – If set to 1 it holds axes that are used for reduction. Defaults to False.
dtype (Optional[torch.dtype], optional) – The data type. Defaults to None.

Returns:

The result of product reducing operation.

Return type:

Tensor

relu() → Tensor#

Apply the relu function to the tensor.

Returns:: The result of applying the relu function.
Return type:: Tensor

reshape(*shape: int | Sequence[int]) → Tensor#

Return the transpose of the tensor.

Parameters:: shape (Union[int, Sequence[int]]) – The new shape of the tensor.
Returns:: The transposed tensor.
Return type:: Tensor

round() → Tensor#

Apply the round function to the tensor.

Returns:: The result of applying the round function.
Return type:: Tensor

property shape: Sequence[int]#

Returns the shape of the tensor.

Returns:: The shape of the tensor.
Return type:: Sequence[int]

sigmoid() → Tensor#

Apply the sigmoid function to the tensor.

Returns:: The result of applying the sigmoid function.
Return type:: Tensor

sign() → Tensor#

Apply the sign function to the tensor.

Returns:: The result of applying the sign function.
Return type:: Tensor

sin() → Tensor#

Apply the sin function to the tensor.

Returns:: The result of applying the sin function.
Return type:: Tensor

sinh() → Tensor#

Apply the sinh function to the tensor.

Returns:: The result of applying the sinh function.
Return type:: Tensor

size(dim=None) → int | Sequence[int]#

Return the size of the tensor.

Parameters:: dim (int, optional) – The dimension to return the size of. Defaults to None.
Returns:: The size of the tensor.
Return type:: Union[int, Sequence[int]]

softmax(dim) → Tensor#

Apply the softmax function to the tensor.

Parameters:: dim (int) – The dimension to apply softmax.
Returns:: The result of applying the softmax function.
Return type:: Tensor

softplus() → Tensor#

Apply the softplus function to the tensor.

Returns:: The result of applying the softplus function.
Return type:: Tensor

sqrt() → Tensor#

Apply the sqrt function to the tensor.

Returns:: The result of applying the sqrt function.
Return type:: Tensor

squeeze() → Tensor#

Remove dimensions of size 1 from the tensor.

Returns:: The squeezed tensor.
Return type:: Tensor

sum(dim: int | Sequence[int] | None = None, keep_dims: bool | None = False, dtype: dtype | None = None) → Tensor#

Return the reduced sum tensor.

Parameters:

dim (Optional[Union[int, Sequence[int]]], optional) – The dim(s) to reduce. Default is None, and all dimensions are reduced.
keep_dims (Optional[bool], optional) – If set to 1 it holds axes that are used for reduction. Defaults to False.
dtype (Optional[torch.dtype], optional) – The data type. Defaults to None.

Returns:

The result of sum reducing operation.

Return type:

Tensor

tan() → Tensor#

Apply the tan function to the tensor.

Returns:: The result of applying the tan function.
Return type:: Tensor

tanh() → Tensor#

Apply the tanh function to the tensor.

Returns:: The result of applying the tanh function.
Return type:: Tensor

to(dtype: NPUDtype) → Tensor#

Convert the tensor to the specified data type.

Parameters:: dtype (NPUDtype) – The data type to convert the tensor to.
Returns:: The converted tensor.
Return type:: Tensor

transpose(dim0: int, dim1: int) → Tensor#

Return the transpose of the tensor.

Parameters:

dim0 (int) – The first dimension to transpose.
dim1 (int) – The second dimension to transpose.

Returns:

The transposed tensor.

Return type:

Tensor

type(dtype: NPUDtype) → Tensor#

Convert the tensor to the specified data type.

Parameters:: dtype (NPUDtype) – The data type to convert the tensor to.
Returns:: The converted tensor.
Return type:: Tensor

unsqueeze(axis) → Tensor#

Add a dimension of size 1 to the tensor.

Parameters:: axis (int) – The axis along which to add the dimension.
Returns:: The unsqueezed tensor.
Return type:: Tensor

view(*shape: Sequence[int] | int) → Tensor#

Return the transpose of the tensor.

Parameters:: shape (Union[Sequence[int], int]) – The new shape of the tensor.
Returns:: The transposed tensor.
Return type:: Tensor

intel_npu_acceleration_library.backend.clear_cache()#: Clear the cache of models.

intel_npu_acceleration_library.backend.get_driver_version() → int#

Get the driver version for the Intel® NPU Acceleration Library.

Raises:: RuntimeError – an error is raised if the platform is not supported. Currently supported platforms are Windows and Linux
Returns:: NPU driver version
Return type:: int

intel_npu_acceleration_library.backend.npu_available() → bool#

Return if the NPU is available.

Returns:: Return True if the NPU is available in the system
Return type:: bool

intel_npu_acceleration_library.backend.run_factory(x: Tensor | List[Tensor], weights: List[Tensor], backend_cls: Any, op_id: str | None = None) → Tensor#

Run a factory operation. Depending on the datatype of the weights it runs a float or quantized operation.

Parameters:

x (Union[torch.Tensor, List[torch.Tensor]]) – Activation tensor(s). Its dtype must be torch.float16
weights (torch.Tensor) – Weights tensor. Its dtype can be torch.float16 or torch.int8
backend_cls (Any) – Backend class to run
op_id (Optional[str], optional) – Operation ID. Defaults to None.

Returns:

result

Return type:

torch.Tensor

intel_npu_acceleration_library.backend.run_matmul(x: Tensor, weights: Tensor, scale: Tensor | None = None, op_id: str | None = None) → Tensor#

Run a matmul operation. Depending on the datatype of the weights it runs a float or quantized operation.

Parameters:

x (torch.Tensor) – Activation tensor. Its dtype must be torch.float16
weights (torch.Tensor) – Weights tensor. Its dtype can be torch.float16 or torch.int8
scale (Optional[torch.Tensor], optional) – Quantization scale. If weights.dtype == torch.int8 then it must be set. Defaults to None.
op_id (Optional[str], optional) – Operation ID. Defaults to None.

Raises:

RuntimeError – Unsupported weights datatype. Supported types: [torch.float16, torch.int8]

Returns:

result

Return type:

torch.Tensor

intel_npu_acceleration_library.backend package

Contents

intel_npu_acceleration_library.backend package#

Submodules#

intel_npu_acceleration_library.backend.base module#

intel_npu_acceleration_library.backend.factory module#

intel_npu_acceleration_library.backend.linear module#

intel_npu_acceleration_library.backend.matmul module#

intel_npu_acceleration_library.backend.mlp module#

intel_npu_acceleration_library.backend.qlinear module#

intel_npu_acceleration_library.backend.qmatmul module#

intel_npu_acceleration_library.backend.runtime module#

Module contents#