intel_npu_acceleration_library.backend package#
Submodules#
intel_npu_acceleration_library.backend.base module#
- class intel_npu_acceleration_library.backend.base.BaseNPUBackend(profile: bool | None = False)#
Bases:
object
A base class that represent a abstract Matrix-Matrix operation on the NPU.
- save(path: str)#
Save the Openvino model.
- Parameters:
path (str) – the model save path
- saveCompiledModel(path: str)#
Save the compiled model.
- Parameters:
path (str) – the compiled model save path
- class intel_npu_acceleration_library.backend.base.BaseNPUBackendWithPrefetch(profile: bool)#
Bases:
BaseNPUBackend
A base class that represent a abstract Matrix-Matrix operation on the NPU.
Linear type classes employ an algorithm to optimize weights prefetching
- add_to_map(wt_hash: str, weights: Iterable[ndarray | Tuple[ndarray, ...]])#
Add an operation parameters to the operation hash:parameter map.
- Parameters:
wt_hash (str) – operation hash
weights (Iterable[Union[np.ndarray, Tuple[np.ndarray, ...]]]) – Operation parameters
- create_parameters(weights: Iterable[ndarray | Tuple[ndarray, ...]]) _Pointer #
Create an operation parameter from a list of weights.
- Parameters:
weights (Iterable[Union[np.ndarray, Tuple[np.ndarray, ...]]]) – Operation parameters
- Raises:
RuntimeError – Quantized weights needs to be in int8 format
ValueError – Invalid dtype for scale
- Returns:
an instance to the Parameters object
- Return type:
ctypes._Pointer
- load_wt_fn(module, parameters)#
Load asyncronously the parameter into the NPU.
- Parameters:
module – the NPU backend module
parameters – the weights parameter class
- prefetchWeights()#
Prefetch next operation weights.
- setWeights(wt_hash: str | None, *args: ndarray | Tuple[ndarray, ...]) bool #
Set the operation weights in the NPU.
- Parameters:
wt_hash (str) – operation hash. If set to None force the load of the weights
args (Union[np.ndarray, Tuple[np.ndarray, ...]]) – Variable length weights list. Can be a np array or a tuple of weight, scale in case of quantized tensors
- Returns:
Return True if the op parameters are already in the op map
- Return type:
bool
- intel_npu_acceleration_library.backend.base.adapt_weight(w: ndarray) ndarray #
Adapt the weights to run on the NPU.
- Parameters:
w (np.ndarray) – weights array
- Returns:
The adapted array
- Return type:
np.ndarray
intel_npu_acceleration_library.backend.factory module#
- class intel_npu_acceleration_library.backend.factory.NNFactory(profile: bool = False, device: str = 'NPU')#
Bases:
BaseNPUBackendWithPrefetch
Linear class, computing a matrix matrix multiplication with weights prefetching.
- avg_pooling(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- compile()#
Finalize and compile a model.
- concat(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- constant(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- convolution(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- get_backend_dtype(dtype) c_char_p #
Get the string representation of the dtype.
- Parameters:
dtype – numpy dtype
- Raises:
RuntimeError – Unsupported datatype
- Returns:
string representation of the dtype
- Return type:
ctypes.c_char_p
- get_tensor_dtype(node)#
Get tensor dtype.
- Parameters:
node – network node
- Raises:
RuntimeError – Unsupported dtype
- Returns:
tensor dtype
- Return type:
str
- get_tensor_recursively(args: Sequence[Any]) List[ndarray] #
Get tensor recursively for a list of arguments.
- Parameters:
args (Sequence[Any]) – Sequence of tensors, tuple of tensors and additional arguments
- Returns:
Sequence of tensors
- Return type:
List[np.ndarray]
- get_tensor_shape(node)#
Get tensor shape.
- Parameters:
node – network node
- Returns:
tensor shape
- Return type:
tuple[int]
- linear(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- matmul(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- max_pooling(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- normL2(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- parameter(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- power(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- reduce_max(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- reduce_mean(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- reduce_min(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- reduce_prod(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- reduce_sum(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- reshape(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- return_tensor() F #
Wrap the output of a function in a Tensor object.
- Parameters:
fn (function) – Function
- Returns:
A function that wraps the output in a Tensor object
- Return type:
function
- run(X: ndarray, *weights: ndarray | Tuple[ndarray, ndarray], **kwargs: Any) ndarray #
Run the layer: X * W^T.
- Parameters:
X (np.ndarray) – lhs operator
weights (Union[np.ndarray, Tuple[np.ndarray, np.ndarray]]) – rhs operators
kwargs (Any) – additional arguments
- Returns:
result
- Return type:
np.ndarray
- set_input_tensor(tensor: ndarray, idx: int)#
Set input tensor.
- Parameters:
tensor (np.ndarray) – Input tensor
idx (int) – tensor index
- slice(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- to(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
intel_npu_acceleration_library.backend.linear module#
- class intel_npu_acceleration_library.backend.linear.Linear(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU')#
Bases:
NNFactory
Linear class, computing a matrix matrix multiplication with weights prefetching.
- run(X: ndarray, W: ndarray, op_id: str) ndarray #
Run the layer: X * W^T.
- Parameters:
X (np.ndarray) – lhs operator
W (np.ndarray) – rhs operator
op_id (str) – operation id
- Raises:
RuntimeError – Input or weight tensor shape mismatch
- Returns:
result
- Return type:
np.ndarray
intel_npu_acceleration_library.backend.matmul module#
- class intel_npu_acceleration_library.backend.matmul.MatMul(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU')#
Bases:
NNFactory
MatMul class, computing a matrix matrix multiplication.
- run(X: ndarray, W: ndarray) ndarray #
Run the layer: X * W^T.
- Parameters:
X (np.ndarray) – lhs operator
W (np.ndarray) – rhs operator
- Raises:
RuntimeError – Input or weight tensor shape mismatch
- Returns:
result
- Return type:
np.ndarray
intel_npu_acceleration_library.backend.mlp module#
- class intel_npu_acceleration_library.backend.mlp.MLP(input_shape: Sequence[int], intermediate_size: int, activation: str = 'swiglu', bias: bool | None = False, profile: bool = False, device: str = 'NPU', **additional_args)#
Bases:
NNFactory
Linear class, computing a matrix matrix multiplication with weights prefetching.
intel_npu_acceleration_library.backend.qlinear module#
- class intel_npu_acceleration_library.backend.qlinear.QLinear(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU', dtype: ~numpy.dtype = <class 'numpy.int8'>)#
Bases:
NNFactory
Quantized Linear class, computing a matrix matrix multiplication with weights prefetching.
- run(X: ndarray, W: ndarray, scale: ndarray, op_id: str) ndarray #
Run the layer: $X * (W * S)^T$ .
- Parameters:
X (np.ndarray) – activation
W (np.ndarray) – quantized weights
scale (np.ndarray) – quantization scale
op_id (str) – operation id
- Raises:
RuntimeError – Input, weights or scale shape mismatch
- Returns:
result
- Return type:
np.ndarray
intel_npu_acceleration_library.backend.qmatmul module#
- class intel_npu_acceleration_library.backend.qmatmul.QMatMul(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU', dtype: ~numpy.dtype = <class 'numpy.int8'>)#
Bases:
NNFactory
Quantized Linear class, computing a matrix matrix multiplication.
- run(X: ndarray, W: ndarray, scale: ndarray) ndarray #
Run the layer: X * (W * S)^T.
- Parameters:
X (np.ndarray) – activation
W (np.ndarray) – quantized weights
scale (np.ndarray) – quantization scale
- Raises:
RuntimeError – Input, weights or scale shape mismatch
- Returns:
result
- Return type:
np.ndarray
intel_npu_acceleration_library.backend.runtime module#
- intel_npu_acceleration_library.backend.runtime.adapt_output_tensor(output: ndarray, original_shape: Size, input_dtype: dtype) Tensor #
Adapt the output tensor to the original shape and dtype.
- Parameters:
output (np.ndarray) – output tensor
original_shape (torch.Size) – original shape
input_dtype (torch.dtype) – input dtype
- Returns:
output tensor
- Return type:
torch.Tensor
- intel_npu_acceleration_library.backend.runtime.clear_cache()#
Clear the cache of models.
- intel_npu_acceleration_library.backend.runtime.run_factory(x: Tensor | List[Tensor], weights: List[Tensor], backend_cls: Any, op_id: str | None = None) Tensor #
Run a factory operation. Depending on the datatype of the weights it runs a float or quantized operation.
- Parameters:
x (Union[torch.Tensor, List[torch.Tensor]]) – Activation tensor(s). Its dtype must be torch.float16
weights (torch.Tensor) – Weights tensor. Its dtype can be torch.float16 or torch.int8
backend_cls (Any) – Backend class to run
op_id (Optional[str], optional) – Operation ID. Defaults to None.
- Returns:
result
- Return type:
torch.Tensor
- intel_npu_acceleration_library.backend.runtime.run_matmul(x: Tensor, weights: Tensor, scale: Tensor | None = None, op_id: str | None = None) Tensor #
Run a matmul operation. Depending on the datatype of the weights it runs a float or quantized operation.
- Parameters:
x (torch.Tensor) – Activation tensor. Its dtype must be torch.float16
weights (torch.Tensor) – Weights tensor. Its dtype can be torch.float16 or torch.int8
scale (Optional[torch.Tensor], optional) – Quantization scale. If weights.dtype == torch.int8 then it must be set. Defaults to None.
op_id (Optional[str], optional) – Operation ID. Defaults to None.
- Raises:
RuntimeError – Unsupported weights datatype. Supported types: [torch.float16, torch.int8]
- Returns:
result
- Return type:
torch.Tensor
- intel_npu_acceleration_library.backend.runtime.set_contiguous(tensor: Tensor) Tensor #
Set tensor to be contiguous in memory.
- Parameters:
tensor (torch.Tensor) – input tensor
- Returns:
output, contiguous tensor
- Return type:
torch.Tensor
Module contents#
- class intel_npu_acceleration_library.backend.Convolution(input_shape: Sequence[int], weights_shape: Sequence[int], bias: bool = False, strides: int | Sequence[int] = 1, padding: int | Sequence[int] = 0, dilation: int | Sequence[int] = 1, groups: int = 1, profile: bool = False, device: str = 'NPU')#
Bases:
NNFactory
Linear class, computing a matrix matrix multiplication with weights prefetching.
- class intel_npu_acceleration_library.backend.Linear(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU')#
Bases:
NNFactory
Linear class, computing a matrix matrix multiplication with weights prefetching.
- run(X: ndarray, W: ndarray, op_id: str) ndarray #
Run the layer: X * W^T.
- Parameters:
X (np.ndarray) – lhs operator
W (np.ndarray) – rhs operator
op_id (str) – operation id
- Raises:
RuntimeError – Input or weight tensor shape mismatch
- Returns:
result
- Return type:
np.ndarray
- class intel_npu_acceleration_library.backend.MLP(input_shape: Sequence[int], intermediate_size: int, activation: str = 'swiglu', bias: bool | None = False, profile: bool = False, device: str = 'NPU', **additional_args)#
Bases:
NNFactory
Linear class, computing a matrix matrix multiplication with weights prefetching.
- class intel_npu_acceleration_library.backend.MatMul(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU')#
Bases:
NNFactory
MatMul class, computing a matrix matrix multiplication.
- run(X: ndarray, W: ndarray) ndarray #
Run the layer: X * W^T.
- Parameters:
X (np.ndarray) – lhs operator
W (np.ndarray) – rhs operator
- Raises:
RuntimeError – Input or weight tensor shape mismatch
- Returns:
result
- Return type:
np.ndarray
- class intel_npu_acceleration_library.backend.NNFactory(profile: bool = False, device: str = 'NPU')#
Bases:
BaseNPUBackendWithPrefetch
Linear class, computing a matrix matrix multiplication with weights prefetching.
- avg_pooling(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- compile()#
Finalize and compile a model.
- concat(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- constant(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- convolution(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- get_backend_dtype(dtype) c_char_p #
Get the string representation of the dtype.
- Parameters:
dtype – numpy dtype
- Raises:
RuntimeError – Unsupported datatype
- Returns:
string representation of the dtype
- Return type:
ctypes.c_char_p
- get_tensor_dtype(node)#
Get tensor dtype.
- Parameters:
node – network node
- Raises:
RuntimeError – Unsupported dtype
- Returns:
tensor dtype
- Return type:
str
- get_tensor_recursively(args: Sequence[Any]) List[ndarray] #
Get tensor recursively for a list of arguments.
- Parameters:
args (Sequence[Any]) – Sequence of tensors, tuple of tensors and additional arguments
- Returns:
Sequence of tensors
- Return type:
List[np.ndarray]
- get_tensor_shape(node)#
Get tensor shape.
- Parameters:
node – network node
- Returns:
tensor shape
- Return type:
tuple[int]
- linear(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- matmul(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- max_pooling(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- normL2(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- parameter(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- power(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- reduce_max(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- reduce_mean(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- reduce_min(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- reduce_prod(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- reduce_sum(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- reshape(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- return_tensor() F #
Wrap the output of a function in a Tensor object.
- Parameters:
fn (function) – Function
- Returns:
A function that wraps the output in a Tensor object
- Return type:
function
- run(X: ndarray, *weights: ndarray | Tuple[ndarray, ndarray], **kwargs: Any) ndarray #
Run the layer: X * W^T.
- Parameters:
X (np.ndarray) – lhs operator
weights (Union[np.ndarray, Tuple[np.ndarray, np.ndarray]]) – rhs operators
kwargs (Any) – additional arguments
- Returns:
result
- Return type:
np.ndarray
- set_input_tensor(tensor: ndarray, idx: int)#
Set input tensor.
- Parameters:
tensor (np.ndarray) – Input tensor
idx (int) – tensor index
- slice(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- to(*args: Any, **kwargs: Any) Tensor #
Wrap the output of a function in a Tensor object.
- Parameters:
args (Any) – Variable length argument list
kwargs (Any) – Arbitrary keyword arguments
- Returns:
Tensor object
- Return type:
- class intel_npu_acceleration_library.backend.QLinear(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU', dtype: ~numpy.dtype = <class 'numpy.int8'>)#
Bases:
NNFactory
Quantized Linear class, computing a matrix matrix multiplication with weights prefetching.
- run(X: ndarray, W: ndarray, scale: ndarray, op_id: str) ndarray #
Run the layer: $X * (W * S)^T$ .
- Parameters:
X (np.ndarray) – activation
W (np.ndarray) – quantized weights
scale (np.ndarray) – quantization scale
op_id (str) – operation id
- Raises:
RuntimeError – Input, weights or scale shape mismatch
- Returns:
result
- Return type:
np.ndarray
- class intel_npu_acceleration_library.backend.QMatMul(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU', dtype: ~numpy.dtype = <class 'numpy.int8'>)#
Bases:
NNFactory
Quantized Linear class, computing a matrix matrix multiplication.
- run(X: ndarray, W: ndarray, scale: ndarray) ndarray #
Run the layer: X * (W * S)^T.
- Parameters:
X (np.ndarray) – activation
W (np.ndarray) – quantized weights
scale (np.ndarray) – quantization scale
- Raises:
RuntimeError – Input, weights or scale shape mismatch
- Returns:
result
- Return type:
np.ndarray
- class intel_npu_acceleration_library.backend.SDPA(query_shapes: Tuple[int, int], key_shapes: Tuple[int, int], value_shapes: Tuple[int, int], mask_shapes: Tuple[int, int], is_causal: bool = False, profile: bool = False, device: str = 'NPU')#
Bases:
NNFactory
Implementation of a ScaledDotProductAttention NPU operation.
- run(query: ndarray, key: ndarray, value: ndarray, mask: ndarray) ndarray #
Run the scaled dot product attention kernel.
- Parameters:
query (np.ndarray) – sdpa query tensor
key (np.ndarray) – sdpa key tensor
value (np.ndarray) – sdpa value tensor
mask (np.ndarray) – sdpa mask tensor
- Returns:
result
- Return type:
np.ndarray
- class intel_npu_acceleration_library.backend.SimpleSDPA(query_shapes: Tuple[int, int], key_shapes: Tuple[int, int], value_shapes: Tuple[int, int], is_causal: bool = False, profile: bool = False, device: str = 'NPU')#
Bases:
NNFactory
Implementation of a ScaledDotProductAttention NPU operation.
- run(query: ndarray, key: ndarray, value: ndarray) ndarray #
Run the scaled dot product attention kernel.
- Parameters:
query (np.ndarray) – sdpa query tensor
key (np.ndarray) – sdpa key tensor
value (np.ndarray) – sdpa value tensor
- Returns:
result
- Return type:
np.ndarray
- class intel_npu_acceleration_library.backend.Tensor(factory: NNFactory, node: _Pointer)#
Bases:
object
Represents a tensor object.
- Attrs:
factory (NNFactory): The factory object used to create the tensor. node (ctypes._Pointer): The pointer to the underlying tensor node. shape (Sequence[int]): The shape of the tensor. dtype (NPUDtype): The data type of the tensor. T (Tensor): The transpose of the tensor.
- __add__(self, other)#
Adds two tensors element-wise.
- __sub__(self, other)#
Subtracts two tensors element-wise.
- __mul__(self, other)#
Multiplies two tensors element-wise.
- __truediv__(self, other)#
Divides two tensors element-wise.
- __neg__(self)#
Negates the tensor.
- __repr__(self)#
Returns a string representation of the tensor.
- __str__(self)#
Returns a string representation of the tensor.
- __len__(self)#
Returns the total number of elements in the tensor.
- T()#
Returns the transpose of the tensor.
- squeeze(self)#
Removes dimensions of size 1 from the tensor.
- unsqueeze(self, axis)#
Adds a dimension of size 1 to the tensor.
- __matmul__(self, other)#
Performs matrix multiplication between two tensors.
- acos(self)#
Applies acos function to the tensor.
- asin(self)#
Applies asin function to the tensor.
- atan(self)#
Applies atan function to the tensor.
- acosh(self)#
Applies acosh function to the tensor.
- asinh(self)#
Applies asinh function to the tensor.
- atanh(self)#
Applies atanh function to the tensor.
- cosh(self)#
Applies cosh function to the tensor.
- sinh(self)#
Applies sinh function to the tensor.
- tanh(self)#
Applies tanh function to the tensor.
- cos(self)#
Applies cos function to the tensor.
- sin(self)#
Applies sin function to the tensor.
- tan(self)#
Applies tan function to the tensor.
- ceiling(self)#
Applies ceil function to the tensor.
- clamp(self, min, max)#
Applies clamp function to the tensor.
- elu(self, alpha)#
Applies elu function to the tensor.
- erf(self)#
Applies erf function to the tensor.
- exp(self)#
Applies exponental function to the tensor.
- floor(self)#
Applies floor function to the tensor.
- grn(self, bias)#
Applies grn function to the tensor.
- hsigmoid(self)#
Applies hsigmoid function to the tensor.
- hswish(self)#
Applies hswish function to the tensor.
- log(self)#
Applies log function to the tensor.
- mish(self)#
Applies mish function to the tensor.
- relu(self, bias)#
Applies relu function to the tensor.
- round(self)#
Applies round function to the tensor.
- sigmoid(self)#
Applies sigmoid function to the tensor.
- sign(self)#
Applies sign function to the tensor.
- softmax(self, dim)#
Applies softmax function to the tensor.
- softplus(self)#
Applies softplus function to the tensor.
- sqrt(self)#
Applies sqrt function to the tensor.
- max(self, dim, keep_dims)#
Returns the reduced max tensor.
- mean(self, dim, keep_dims, dtype)#
Returns the reduced mean tensor.
- min(self, dim, keep_dims)#
Returns the reduced min tensor.
- prod(self, dim, keep_dims, dtype)#
Returns the reduced product tensor.
- sum(self, dim, keep_dims, dtype)#
Returns the reduced sum tensor.
- property T: Tensor#
Return the transpose of the tensor.
- Returns:
The transposed tensor.
- Return type:
- acos() Tensor #
Apply the acos function to the tensor.
- Returns:
The result of applying the acos function.
- Return type:
- acosh() Tensor #
Apply the acosh function to the tensor.
- Returns:
The result of applying the acosh function.
- Return type:
- asin() Tensor #
Apply the asin function to the tensor.
- Returns:
The result of applying the asin function.
- Return type:
- asinh() Tensor #
Apply the asinh function to the tensor.
- Returns:
The result of applying the asinh function.
- Return type:
- atan() Tensor #
Apply the atan function to the tensor.
- Returns:
The result of applying the atan function.
- Return type:
- atanh() Tensor #
Apply the atanh function to the tensor.
- Returns:
The result of applying the atanh function.
- Return type:
- ceiling() Tensor #
Apply the ceiling function to the tensor.
- Returns:
The result of applying the ceiling function.
- Return type:
- chunk(chunks: int, dim: int = 0) Tensor | list #
Return the list of tensor chunks.
- Parameters:
chunks (int) – The number of chunks to return.
dim (int) – The dimension along which to split the tensor. Default is 0.
- Returns:
The resulting list of split tensors or a single tensor.
- Return type:
Union[“Tensor”, list]
- Raises:
ValueError – The input chunks value is not valid.
- clamp(min=None, max=None) Tensor #
Apply the clamp function to the tensor.
- Parameters:
min (int, float) – The lower-bound of the range to be clamped
max (int, float) – The upper-bound of the range to be clamped
- Returns:
The result of applying the ceil function.
- Return type:
- cos() Tensor #
Apply the cos function to the tensor.
- Returns:
The result of applying the cos function.
- Return type:
- cosh() Tensor #
Apply the cosh function to the tensor.
- Returns:
The result of applying the cosh function.
- Return type:
- dim() int #
Return the number of dimensions of the tensor.
- Returns:
The number of dimensions of the tensor.
- Return type:
int
- property dtype: NPUDtype#
Returns the data type of the tensor.
- Returns:
The data type of the tensor.
- Return type:
type
- elu(alpha: float = 1.0) Tensor #
Apply the elu function to the tensor.
- Parameters:
alpha (float) – The alpha value. Defaults to 1.0.
- Returns:
The result of applying the elu function.
- Return type:
- erf() Tensor #
Apply the erf function to the tensor.
- Returns:
The result of applying the erf function.
- Return type:
- exp() Tensor #
Apply the exp function to the tensor.
- Returns:
The result of applying the exp function.
- Return type:
- flatten(start_dim=0, end_dim=-1) Tensor #
Flatten the tensor.
- Parameters:
start_dim (int) – The first dim to flatten. Defaults to 0.
end_dim (int) – The last dim to flatten. Defaults to -1.
- Returns:
The flattened tensor.
- Return type:
- floor() Tensor #
Apply the floor function to the tensor.
- Returns:
The result of applying the floor function.
- Return type:
- grn(bias: float = 1e-12) Tensor #
Apply the grn function to the tensor.
- Parameters:
bias (float) – The bias value. Defaults to 1e-12.
- Returns:
The result of applying the grn function.
- Return type:
- hsigmoid() Tensor #
Apply the hsigmoid function to the tensor.
- Returns:
The result of applying the hsigmoid function.
- Return type:
- hswish() Tensor #
Apply the hswish function to the tensor.
- Returns:
The result of applying the hswish function.
- Return type:
- log() Tensor #
Apply the log function to the tensor.
- Returns:
The result of applying the log function.
- Return type:
- max(dim: int | None = None, keep_dims: bool | None = False) Tensor #
Return the reduced max tensor.
- Parameters:
dim (Optional[int], optional) – The dim to reduce. Default is None, and all dimensions are reduced.
keep_dims (Optional[bool], optional) – If set to 1 it holds axes that are used for reduction. Defaults to False.
- Returns:
The result of max reducing operation.
- Return type:
- mean(dim: int | Sequence[int] | None = None, keep_dims: bool | None = False, dtype: dtype | None = None) Tensor #
Return the reduced mean tensor.
- Parameters:
dim (Optional[Union[int, Sequence[int]]], optional) – The dim(s) to reduce. Default is None, and all dimensions are reduced.
keep_dims (Optional[bool], optional) – If set to 1 it holds axes that are used for reduction. Defaults to False.
dtype (Optional[torch.dtype], optional) – The data type. Defaults to None.
- Returns:
The result of mean reducing operation.
- Return type:
- min(dim: int | None = None, keep_dims: bool | None = False) Tensor #
Return the reduced min tensor.
- Parameters:
dim (Optional[int], optional) – The dim to reduce. Default is None, and all dimensions are reduced.
keep_dims (Optional[bool], optional) – If set to 1 it holds axes that are used for reduction. Defaults to False.
- Returns:
The result of min reducing operation.
- Return type:
- mish() Tensor #
Apply the mish function to the tensor.
- Returns:
The result of applying the mish function.
- Return type:
- node: _Pointer#
- permute(*input_order: int) Tensor #
Return the transpose of the tensor.
- Parameters:
input_order (Sequence[int]) – The order of the dimensions in the transposed tensor.
- Returns:
The transposed tensor.
- Return type:
- prod(dim: int | None = None, keep_dims: bool | None = False, dtype: dtype | None = None) Tensor #
Return the reduced product tensor.
- Parameters:
dim (Optional[int], optional) – The dim to reduce. Default is None, and all dimensions are reduced.
keep_dims (Optional[bool], optional) – If set to 1 it holds axes that are used for reduction. Defaults to False.
dtype (Optional[torch.dtype], optional) – The data type. Defaults to None.
- Returns:
The result of product reducing operation.
- Return type:
- relu() Tensor #
Apply the relu function to the tensor.
- Returns:
The result of applying the relu function.
- Return type:
- reshape(*shape: int | Sequence[int]) Tensor #
Return the transpose of the tensor.
- Parameters:
shape (Union[int, Sequence[int]]) – The new shape of the tensor.
- Returns:
The transposed tensor.
- Return type:
- round() Tensor #
Apply the round function to the tensor.
- Returns:
The result of applying the round function.
- Return type:
- property shape: Sequence[int]#
Returns the shape of the tensor.
- Returns:
The shape of the tensor.
- Return type:
Sequence[int]
- sigmoid() Tensor #
Apply the sigmoid function to the tensor.
- Returns:
The result of applying the sigmoid function.
- Return type:
- sign() Tensor #
Apply the sign function to the tensor.
- Returns:
The result of applying the sign function.
- Return type:
- sin() Tensor #
Apply the sin function to the tensor.
- Returns:
The result of applying the sin function.
- Return type:
- sinh() Tensor #
Apply the sinh function to the tensor.
- Returns:
The result of applying the sinh function.
- Return type:
- size(dim=None) int | Sequence[int] #
Return the size of the tensor.
- Parameters:
dim (int, optional) – The dimension to return the size of. Defaults to None.
- Returns:
The size of the tensor.
- Return type:
Union[int, Sequence[int]]
- softmax(dim) Tensor #
Apply the softmax function to the tensor.
- Parameters:
dim (int) – The dimension to apply softmax.
- Returns:
The result of applying the softmax function.
- Return type:
- softplus() Tensor #
Apply the softplus function to the tensor.
- Returns:
The result of applying the softplus function.
- Return type:
- sqrt() Tensor #
Apply the sqrt function to the tensor.
- Returns:
The result of applying the sqrt function.
- Return type:
- squeeze() Tensor #
Remove dimensions of size 1 from the tensor.
- Returns:
The squeezed tensor.
- Return type:
- sum(dim: int | Sequence[int] | None = None, keep_dims: bool | None = False, dtype: dtype | None = None) Tensor #
Return the reduced sum tensor.
- Parameters:
dim (Optional[Union[int, Sequence[int]]], optional) – The dim(s) to reduce. Default is None, and all dimensions are reduced.
keep_dims (Optional[bool], optional) – If set to 1 it holds axes that are used for reduction. Defaults to False.
dtype (Optional[torch.dtype], optional) – The data type. Defaults to None.
- Returns:
The result of sum reducing operation.
- Return type:
- tan() Tensor #
Apply the tan function to the tensor.
- Returns:
The result of applying the tan function.
- Return type:
- tanh() Tensor #
Apply the tanh function to the tensor.
- Returns:
The result of applying the tanh function.
- Return type:
- to(dtype: NPUDtype) Tensor #
Convert the tensor to the specified data type.
- Parameters:
dtype (NPUDtype) – The data type to convert the tensor to.
- Returns:
The converted tensor.
- Return type:
- transpose(dim0: int, dim1: int) Tensor #
Return the transpose of the tensor.
- Parameters:
dim0 (int) – The first dimension to transpose.
dim1 (int) – The second dimension to transpose.
- Returns:
The transposed tensor.
- Return type:
- type(dtype: NPUDtype) Tensor #
Convert the tensor to the specified data type.
- Parameters:
dtype (NPUDtype) – The data type to convert the tensor to.
- Returns:
The converted tensor.
- Return type:
- intel_npu_acceleration_library.backend.clear_cache()#
Clear the cache of models.
- intel_npu_acceleration_library.backend.get_driver_version() int #
Get the driver version for the Intel® NPU Acceleration Library.
- Raises:
RuntimeError – an error is raised if the platform is not supported. Currently supported platforms are Windows and Linux
- Returns:
NPU driver version
- Return type:
int
- intel_npu_acceleration_library.backend.npu_available() bool #
Return if the NPU is available.
- Returns:
Return True if the NPU is available in the system
- Return type:
bool
- intel_npu_acceleration_library.backend.run_factory(x: Tensor | List[Tensor], weights: List[Tensor], backend_cls: Any, op_id: str | None = None) Tensor #
Run a factory operation. Depending on the datatype of the weights it runs a float or quantized operation.
- Parameters:
x (Union[torch.Tensor, List[torch.Tensor]]) – Activation tensor(s). Its dtype must be torch.float16
weights (torch.Tensor) – Weights tensor. Its dtype can be torch.float16 or torch.int8
backend_cls (Any) – Backend class to run
op_id (Optional[str], optional) – Operation ID. Defaults to None.
- Returns:
result
- Return type:
torch.Tensor
- intel_npu_acceleration_library.backend.run_matmul(x: Tensor, weights: Tensor, scale: Tensor | None = None, op_id: str | None = None) Tensor #
Run a matmul operation. Depending on the datatype of the weights it runs a float or quantized operation.
- Parameters:
x (torch.Tensor) – Activation tensor. Its dtype must be torch.float16
weights (torch.Tensor) – Weights tensor. Its dtype can be torch.float16 or torch.int8
scale (Optional[torch.Tensor], optional) – Quantization scale. If weights.dtype == torch.int8 then it must be set. Defaults to None.
op_id (Optional[str], optional) – Operation ID. Defaults to None.
- Raises:
RuntimeError – Unsupported weights datatype. Supported types: [torch.float16, torch.int8]
- Returns:
result
- Return type:
torch.Tensor