intel_npu_acceleration_library.backend package#
Submodules#
intel_npu_acceleration_library.backend.base module#
- class intel_npu_acceleration_library.backend.base.BaseNPUBackend(profile: bool | None = False)#
Bases:
object
A base class that represent a abstract Matrix-Matrix operation on the NPU.
- save(path: str)#
Save the Openvino model.
- Parameters:
path (str) – the model save path
- saveCompiledModel(path: str)#
Save the compiled model.
- Parameters:
path (str) – the compiled model save path
- class intel_npu_acceleration_library.backend.base.BaseNPUBackendWithPrefetch(profile: bool)#
Bases:
BaseNPUBackend
A base class that represent a abstract Matrix-Matrix operation on the NPU.
Linear type classes employ an algorithm to optimize weights prefetching
- add_to_map(wt_hash: str, weights: Iterable[ndarray | Tuple[ndarray, ...]])#
Add an operation parameters to the operation hash:parameter map.
- Parameters:
wt_hash (str) – operation hash
weights (Iterable[Union[np.ndarray, Tuple[np.ndarray, ...]]]) – Operation parameters
- create_parameters(weights: Iterable[ndarray | Tuple[ndarray, ...]]) _Pointer #
Create an operation parameter from a list of weights.
- Parameters:
weights (Iterable[Union[np.ndarray, Tuple[np.ndarray, ...]]]) – Operation parameters
- Raises:
RuntimeError – Quantized weights needs to be in int8 format
ValueError – Invalid dtype for scale
- Returns:
an instance to the Parameters object
- Return type:
ctypes._Pointer
- load_wt_fn(module, parameters)#
Load asyncronously the parameter into the NPU.
- Parameters:
module – the NPU backend module
parameters – the weights parameter class
- prefetchWeights()#
Prefetch next operation weights.
- setWeights(wt_hash: str | None, *args: ndarray | Tuple[ndarray, ...]) bool #
Set the operation weights in the NPU.
- Parameters:
wt_hash (str) – operation hash. If set to None force the load of the weights
args (Union[np.ndarray, Tuple[np.ndarray, ...]]) – Variable length weights list. Can be a np array or a tuple of weight, scale in case of quantized tensors
- Returns:
Return True if the op parameters are already in the op map
- Return type:
bool
- intel_npu_acceleration_library.backend.base.adapt_weight(w: ndarray) ndarray #
Adapt the weights to run on the NPU.
- Parameters:
w (np.ndarray) – weights array
- Raises:
RuntimeError – Unsupported shape
- Returns:
The adapted array
- Return type:
np.ndarray
intel_npu_acceleration_library.backend.factory module#
- class intel_npu_acceleration_library.backend.factory.NNFactory(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU')#
Bases:
BaseNPUBackendWithPrefetch
Linear class, computing a matrix matrix multiplication with weights prefetching.
- compile(output_node: _Pointer)#
Finalize and compile a model.
- Parameters:
output_node (ctypes._Pointer) – Model output node
- linear(input_node: _Pointer, output_channels: int, input_channels: int, bias: bool | None = False, quantize: bool = False) _Pointer #
Generate a linear layer.
- Parameters:
input_node (ctypes._Pointer) – layer input node
output_channels (int) – number of output channels
input_channels (int) – number of input channels
bias (bool, optional) – enable/disable bias. Defaults to False.
quantize (bool, optional) – quantize linear model. Defaults to False.
- Returns:
_description_
- Return type:
ctypes._Pointer
- parameter(shape: ~typing.Tuple[int, int], dtype: ~numpy.dtype[~typing.Any] | None | type[~typing.Any] | ~numpy._typing._dtype_like._SupportsDType[~numpy.dtype[~typing.Any]] | str | tuple[~typing.Any, int] | tuple[~typing.Any, ~typing.SupportsIndex | ~collections.abc.Sequence[~typing.SupportsIndex]] | list[~typing.Any] | ~numpy._typing._dtype_like._DTypeDict | tuple[~typing.Any, ~typing.Any] = <class 'numpy.float16'>) _Pointer #
Generate a model input parameter.
- Parameters:
shape (Tuple[int, int]) – Parameter shape (only 2D tensors supported atm)
dtype (np.dtype, optional) – parameter type np.int8 and np.float16 supported. Defaults to np.float16.
- Raises:
RuntimeError – Unsupported shape
- Returns:
an instance to a parameter object
- Return type:
ctypes._Pointer
- run(X: ndarray, *weights: ndarray | Tuple[ndarray, ndarray], **kwargs: Any) ndarray #
Run the layer: X * W^T.
- Parameters:
X (np.ndarray) – lhs operator
weights (Union[np.ndarray, Tuple[np.ndarray, np.ndarray]]) – rhs operators
kwargs (Any) – additional arguments
- Raises:
RuntimeError – Input tensor shape mismatch
- Returns:
result
- Return type:
np.ndarray
intel_npu_acceleration_library.backend.linear module#
- class intel_npu_acceleration_library.backend.linear.Linear(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU')#
Bases:
NNFactory
Linear class, computing a matrix matrix multiplication with weights prefetching.
- run(X: ndarray, W: ndarray, op_id: str) ndarray #
Run the layer: X * W^T.
- Parameters:
X (np.ndarray) – lhs operator
W (np.ndarray) – rhs operator
op_id (str) – operation id
- Raises:
RuntimeError – Input or weight tensor shape mismatch
- Returns:
result
- Return type:
np.ndarray
intel_npu_acceleration_library.backend.matmul module#
- class intel_npu_acceleration_library.backend.matmul.MatMul(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU')#
Bases:
NNFactory
MatMul class, computing a matrix matrix multiplication.
- run(X: ndarray, W: ndarray) ndarray #
Run the layer: X * W^T.
- Parameters:
X (np.ndarray) – lhs operator
W (np.ndarray) – rhs operator
- Raises:
RuntimeError – Input or weight tensor shape mismatch
- Returns:
result
- Return type:
np.ndarray
intel_npu_acceleration_library.backend.mlp module#
- class intel_npu_acceleration_library.backend.mlp.MLP(hidden_size: int, intermediate_size: int, batch: int, activation: str = 'swiglu', bias: bool | None = False, profile: bool = False, device: str = 'NPU')#
Bases:
NNFactory
Linear class, computing a matrix matrix multiplication with weights prefetching.
intel_npu_acceleration_library.backend.qlinear module#
- class intel_npu_acceleration_library.backend.qlinear.QLinear(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU')#
Bases:
NNFactory
Quantized Linear class, computing a matrix matrix multiplication with weights prefetching.
- run(X: ndarray, W: ndarray, scale: ndarray, op_id: str) ndarray #
Run the layer: $X * (W * S)^T$ .
- Parameters:
X (np.ndarray) – activation
W (np.ndarray) – quantized weights
scale (np.ndarray) – quantization scale
op_id (str) – operation id
- Raises:
RuntimeError – Input, weights or scale shape mismatch
- Returns:
result
- Return type:
np.ndarray
intel_npu_acceleration_library.backend.qmatmul module#
- class intel_npu_acceleration_library.backend.qmatmul.QMatMul(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU')#
Bases:
NNFactory
Quantized Linear class, computing a matrix matrix multiplication.
- run(X: ndarray, W: ndarray, scale: ndarray) ndarray #
Run the layer: X * (W * S)^T.
- Parameters:
X (np.ndarray) – activation
W (np.ndarray) – quantized weights
scale (np.ndarray) – quantization scale
- Raises:
RuntimeError – Input, weights or scale shape mismatch
- Returns:
result
- Return type:
np.ndarray
intel_npu_acceleration_library.backend.runtime module#
- intel_npu_acceleration_library.backend.runtime.clear_cache()#
Clear the cache of models.
- intel_npu_acceleration_library.backend.runtime.run_factory(x: Tensor, weights: List[Tensor], backend_cls: Any, op_id: str | None = None) Tensor #
Run a factory operation. Depending on the datatype of the weights it runs a float or quantized operation.
- Parameters:
x (torch.Tensor) – Activation tensor. Its dtype must be torch.float16
weights (torch.Tensor) – Weights tensor. Its dtype can be torch.float16 or torch.int8
backend_cls (Any) – Backend class to run
op_id (Optional[str], optional) – Operation ID. Defaults to None.
- Returns:
result
- Return type:
torch.Tensor
- intel_npu_acceleration_library.backend.runtime.run_matmul(x: Tensor, weights: Tensor, scale: Tensor | None = None, op_id: str | None = None) Tensor #
Run a matmul operation. Depending on the datatype of the weights it runs a float or quantized operation.
- Parameters:
x (torch.Tensor) – Activation tensor. Its dtype must be torch.float16
weights (torch.Tensor) – Weights tensor. Its dtype can be torch.float16 or torch.int8
scale (Optional[torch.Tensor], optional) – Quantization scale. If weights.dtype == torch.int8 then it must be set. Defaults to None.
op_id (Optional[str], optional) – Operation ID. Defaults to None.
- Raises:
RuntimeError – Unsupported weights datatype. Supported types: [torch.float16, torch.int8]
- Returns:
result
- Return type:
torch.Tensor
- intel_npu_acceleration_library.backend.runtime.set_contiguous(tensor: Tensor) Tensor #
Set tensor to be contiguous in memory.
- Parameters:
tensor (torch.Tensor) – input tensor
- Returns:
output, contiguous tensor
- Return type:
torch.Tensor
Module contents#
- class intel_npu_acceleration_library.backend.Linear(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU')#
Bases:
NNFactory
Linear class, computing a matrix matrix multiplication with weights prefetching.
- run(X: ndarray, W: ndarray, op_id: str) ndarray #
Run the layer: X * W^T.
- Parameters:
X (np.ndarray) – lhs operator
W (np.ndarray) – rhs operator
op_id (str) – operation id
- Raises:
RuntimeError – Input or weight tensor shape mismatch
- Returns:
result
- Return type:
np.ndarray
- class intel_npu_acceleration_library.backend.MLP(hidden_size: int, intermediate_size: int, batch: int, activation: str = 'swiglu', bias: bool | None = False, profile: bool = False, device: str = 'NPU')#
Bases:
NNFactory
Linear class, computing a matrix matrix multiplication with weights prefetching.
- class intel_npu_acceleration_library.backend.MatMul(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU')#
Bases:
NNFactory
MatMul class, computing a matrix matrix multiplication.
- run(X: ndarray, W: ndarray) ndarray #
Run the layer: X * W^T.
- Parameters:
X (np.ndarray) – lhs operator
W (np.ndarray) – rhs operator
- Raises:
RuntimeError – Input or weight tensor shape mismatch
- Returns:
result
- Return type:
np.ndarray
- class intel_npu_acceleration_library.backend.NNFactory(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU')#
Bases:
BaseNPUBackendWithPrefetch
Linear class, computing a matrix matrix multiplication with weights prefetching.
- compile(output_node: _Pointer)#
Finalize and compile a model.
- Parameters:
output_node (ctypes._Pointer) – Model output node
- linear(input_node: _Pointer, output_channels: int, input_channels: int, bias: bool | None = False, quantize: bool = False) _Pointer #
Generate a linear layer.
- Parameters:
input_node (ctypes._Pointer) – layer input node
output_channels (int) – number of output channels
input_channels (int) – number of input channels
bias (bool, optional) – enable/disable bias. Defaults to False.
quantize (bool, optional) – quantize linear model. Defaults to False.
- Returns:
_description_
- Return type:
ctypes._Pointer
- parameter(shape: ~typing.Tuple[int, int], dtype: ~numpy.dtype[~typing.Any] | None | type[~typing.Any] | ~numpy._typing._dtype_like._SupportsDType[~numpy.dtype[~typing.Any]] | str | tuple[~typing.Any, int] | tuple[~typing.Any, ~typing.SupportsIndex | ~collections.abc.Sequence[~typing.SupportsIndex]] | list[~typing.Any] | ~numpy._typing._dtype_like._DTypeDict | tuple[~typing.Any, ~typing.Any] = <class 'numpy.float16'>) _Pointer #
Generate a model input parameter.
- Parameters:
shape (Tuple[int, int]) – Parameter shape (only 2D tensors supported atm)
dtype (np.dtype, optional) – parameter type np.int8 and np.float16 supported. Defaults to np.float16.
- Raises:
RuntimeError – Unsupported shape
- Returns:
an instance to a parameter object
- Return type:
ctypes._Pointer
- run(X: ndarray, *weights: ndarray | Tuple[ndarray, ndarray], **kwargs: Any) ndarray #
Run the layer: X * W^T.
- Parameters:
X (np.ndarray) – lhs operator
weights (Union[np.ndarray, Tuple[np.ndarray, np.ndarray]]) – rhs operators
kwargs (Any) – additional arguments
- Raises:
RuntimeError – Input tensor shape mismatch
- Returns:
result
- Return type:
np.ndarray
- class intel_npu_acceleration_library.backend.QLinear(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU')#
Bases:
NNFactory
Quantized Linear class, computing a matrix matrix multiplication with weights prefetching.
- run(X: ndarray, W: ndarray, scale: ndarray, op_id: str) ndarray #
Run the layer: $X * (W * S)^T$ .
- Parameters:
X (np.ndarray) – activation
W (np.ndarray) – quantized weights
scale (np.ndarray) – quantization scale
op_id (str) – operation id
- Raises:
RuntimeError – Input, weights or scale shape mismatch
- Returns:
result
- Return type:
np.ndarray
- class intel_npu_acceleration_library.backend.QMatMul(inC: int, outC: int, batch: int, profile: bool = False, device: str = 'NPU')#
Bases:
NNFactory
Quantized Linear class, computing a matrix matrix multiplication.
- run(X: ndarray, W: ndarray, scale: ndarray) ndarray #
Run the layer: X * (W * S)^T.
- Parameters:
X (np.ndarray) – activation
W (np.ndarray) – quantized weights
scale (np.ndarray) – quantization scale
- Raises:
RuntimeError – Input, weights or scale shape mismatch
- Returns:
result
- Return type:
np.ndarray
- intel_npu_acceleration_library.backend.clear_cache()#
Clear the cache of models.
- intel_npu_acceleration_library.backend.get_driver_version() str #
Get the driver version for the Intel® NPU Acceleration Library.
- Raises:
RuntimeError – an error is raised if the platform is not supported. Currently supported platforms are Windows and Linux
- Returns:
_description_
- Return type:
str
- intel_npu_acceleration_library.backend.npu_available() bool #
Return if the NPU is available.
- Returns:
Return True if the NPU is available in the system
- Return type:
bool
- intel_npu_acceleration_library.backend.run_factory(x: Tensor, weights: List[Tensor], backend_cls: Any, op_id: str | None = None) Tensor #
Run a factory operation. Depending on the datatype of the weights it runs a float or quantized operation.
- Parameters:
x (torch.Tensor) – Activation tensor. Its dtype must be torch.float16
weights (torch.Tensor) – Weights tensor. Its dtype can be torch.float16 or torch.int8
backend_cls (Any) – Backend class to run
op_id (Optional[str], optional) – Operation ID. Defaults to None.
- Returns:
result
- Return type:
torch.Tensor
- intel_npu_acceleration_library.backend.run_matmul(x: Tensor, weights: Tensor, scale: Tensor | None = None, op_id: str | None = None) Tensor #
Run a matmul operation. Depending on the datatype of the weights it runs a float or quantized operation.
- Parameters:
x (torch.Tensor) – Activation tensor. Its dtype must be torch.float16
weights (torch.Tensor) – Weights tensor. Its dtype can be torch.float16 or torch.int8
scale (Optional[torch.Tensor], optional) – Quantization scale. If weights.dtype == torch.int8 then it must be set. Defaults to None.
op_id (Optional[str], optional) – Operation ID. Defaults to None.
- Raises:
RuntimeError – Unsupported weights datatype. Supported types: [torch.float16, torch.int8]
- Returns:
result
- Return type:
torch.Tensor