intel_npu_acceleration_library package#
Subpackages#
- intel_npu_acceleration_library.backend package
- Submodules
- intel_npu_acceleration_library.backend.base module
- intel_npu_acceleration_library.backend.factory module
NNFactory
NNFactory.avg_pooling()
NNFactory.compile()
NNFactory.concat()
NNFactory.constant()
NNFactory.convolution()
NNFactory.get_backend_dtype()
NNFactory.get_tensor_dtype()
NNFactory.get_tensor_recursively()
NNFactory.get_tensor_shape()
NNFactory.linear()
NNFactory.matmul()
NNFactory.max_pooling()
NNFactory.normL2()
NNFactory.parameter()
NNFactory.power()
NNFactory.reduce_max()
NNFactory.reduce_mean()
NNFactory.reduce_min()
NNFactory.reduce_prod()
NNFactory.reduce_sum()
NNFactory.reshape()
NNFactory.return_tensor()
NNFactory.run()
NNFactory.set_input_tensor()
NNFactory.slice()
NNFactory.to()
NNFactory.transpose()
NNFactory.unsqueeze()
- intel_npu_acceleration_library.backend.linear module
- intel_npu_acceleration_library.backend.matmul module
- intel_npu_acceleration_library.backend.mlp module
- intel_npu_acceleration_library.backend.qlinear module
- intel_npu_acceleration_library.backend.qmatmul module
- intel_npu_acceleration_library.backend.runtime module
- Module contents
Convolution
Linear
MLP
MatMul
NNFactory
NNFactory.avg_pooling()
NNFactory.compile()
NNFactory.concat()
NNFactory.constant()
NNFactory.convolution()
NNFactory.get_backend_dtype()
NNFactory.get_tensor_dtype()
NNFactory.get_tensor_recursively()
NNFactory.get_tensor_shape()
NNFactory.linear()
NNFactory.matmul()
NNFactory.max_pooling()
NNFactory.normL2()
NNFactory.parameter()
NNFactory.power()
NNFactory.reduce_max()
NNFactory.reduce_mean()
NNFactory.reduce_min()
NNFactory.reduce_prod()
NNFactory.reduce_sum()
NNFactory.reshape()
NNFactory.return_tensor()
NNFactory.run()
NNFactory.set_input_tensor()
NNFactory.slice()
NNFactory.to()
NNFactory.transpose()
NNFactory.unsqueeze()
QLinear
QMatMul
SDPA
SimpleSDPA
Tensor
Tensor.__add__()
Tensor.__sub__()
Tensor.__mul__()
Tensor.__truediv__()
Tensor.__neg__()
Tensor.__repr__()
Tensor.__str__()
Tensor.__len__()
Tensor.T()
Tensor.squeeze()
Tensor.unsqueeze()
Tensor.__matmul__()
Tensor.acos()
Tensor.asin()
Tensor.atan()
Tensor.acosh()
Tensor.asinh()
Tensor.atanh()
Tensor.cosh()
Tensor.sinh()
Tensor.tanh()
Tensor.cos()
Tensor.sin()
Tensor.tan()
Tensor.ceiling()
Tensor.clamp()
Tensor.elu()
Tensor.erf()
Tensor.exp()
Tensor.floor()
Tensor.grn()
Tensor.hsigmoid()
Tensor.hswish()
Tensor.log()
Tensor.mish()
Tensor.relu()
Tensor.round()
Tensor.sigmoid()
Tensor.sign()
Tensor.softmax()
Tensor.softplus()
Tensor.sqrt()
Tensor.max()
Tensor.mean()
Tensor.min()
Tensor.prod()
Tensor.sum()
Tensor.T
Tensor.acos()
Tensor.acosh()
Tensor.asin()
Tensor.asinh()
Tensor.atan()
Tensor.atanh()
Tensor.ceiling()
Tensor.chunk()
Tensor.clamp()
Tensor.cos()
Tensor.cosh()
Tensor.dim()
Tensor.dtype
Tensor.elu()
Tensor.erf()
Tensor.exp()
Tensor.factory
Tensor.flatten()
Tensor.floor()
Tensor.grn()
Tensor.hsigmoid()
Tensor.hswish()
Tensor.log()
Tensor.max()
Tensor.mean()
Tensor.min()
Tensor.mish()
Tensor.node
Tensor.permute()
Tensor.prod()
Tensor.relu()
Tensor.reshape()
Tensor.round()
Tensor.shape
Tensor.sigmoid()
Tensor.sign()
Tensor.sin()
Tensor.sinh()
Tensor.size()
Tensor.softmax()
Tensor.softplus()
Tensor.sqrt()
Tensor.squeeze()
Tensor.sum()
Tensor.tan()
Tensor.tanh()
Tensor.to()
Tensor.transpose()
Tensor.type()
Tensor.unsqueeze()
Tensor.view()
clear_cache()
get_driver_version()
npu_available()
run_factory()
run_matmul()
- intel_npu_acceleration_library.nn package
- intel_npu_acceleration_library.functional package
Submodules#
intel_npu_acceleration_library.bindings module#
intel_npu_acceleration_library.compiler module#
- class intel_npu_acceleration_library.compiler.CompilerConfig(use_to: bool = False, dtype: dtype | NPUDtype = torch.float16, training: bool = False)#
Bases:
object
Configuration class to store the compilation configuration of a model for the NPU.
- intel_npu_acceleration_library.compiler.apply_general_optimizations(model: Module)#
Apply general optimizations to a torch.nn.Module.
- Parameters:
model (torch.nn.Module) – a pytorch nn.Module to compile and optimize for the npu
- intel_npu_acceleration_library.compiler.apply_horizontal_fusion(model: Module, *args: Any, **kwargs: Any)#
Recursively apply the optimization function.
- Parameters:
model (torch.nn.Module) – original module
args (Any) – positional arguments
kwargs (Any) – keyword arguments
- intel_npu_acceleration_library.compiler.compile(model: Module, config: CompilerConfig) Module #
Compile a model for the NPU.
- Parameters:
model (torch.nn.Module) – a pytorch nn.Module to compile and optimize for the npu
config (CompilerConfig) – the compiler configuration
- Raises:
RuntimeError – invalid datatypes
- Returns:
compiled NPU nn.Module
- Return type:
torch.nn.Module
- intel_npu_acceleration_library.compiler.create_npu_kernels(model: Module)#
Create NPU kernels.
- Parameters:
model (torch.nn.Module) – a pytorch nn.Module to compile and optimize for the npu
- intel_npu_acceleration_library.compiler.forward(self, input)#
Override forward method for WeightOnlyLinear class.
- Parameters:
input – The input tensor.
- Returns:
The output tensor.
- Return type:
torch.Tensor
- intel_npu_acceleration_library.compiler.lower_linear(model: Module, *args: Any, **kwargs: Any)#
Recursively apply the optimization function.
- Parameters:
model (torch.nn.Module) – original module
args (Any) – positional arguments
kwargs (Any) – keyword arguments
- intel_npu_acceleration_library.compiler.module_optimization(func: Callable) Module #
Optimize recursively a torch.nn.Module with a specific function.
The function func get called recursively to every module in the network.
- Parameters:
func (Callable) – optimization function
- Returns:
optimized module
- Return type:
torch.nn.Module
- intel_npu_acceleration_library.compiler.npu(gm: Module | GraphModule, example_inputs: List[Tensor]) Module | GraphModule #
Implement the custom torch 2.0 compile backend for the NPU.
- Parameters:
gm (Union[torch.nn.Module, torch.fx.GraphModule]) – The torch fx Module
example_inputs (List[torch.Tensor]) – A list of example inputs
- Returns:
The compiled model
- Return type:
Union[torch.nn.Module, torch.fx.GraphModule]
- intel_npu_acceleration_library.compiler.optimize_llama_attention(model: Module, *args: Any, **kwargs: Any)#
Recursively apply the optimization function.
- Parameters:
model (torch.nn.Module) – original module
args (Any) – positional arguments
kwargs (Any) – keyword arguments
- intel_npu_acceleration_library.compiler.weights_quantization(model: Module, *args: Any, **kwargs: Any)#
Recursively apply the optimization function.
- Parameters:
model (torch.nn.Module) – original module
args (Any) – positional arguments
kwargs (Any) – keyword arguments
intel_npu_acceleration_library.optimizations module#
- intel_npu_acceleration_library.optimizations.delattr_recursively(module: Module, target: str)#
Delete attribute recursively by name in a torch.nn.Module.
- Parameters:
module (nn.Module) – the nn.Module
target (str) – the attribute you want to delete
- intel_npu_acceleration_library.optimizations.fuse_linear_layers(model: Module, modules: Dict[str, Linear], targets: List[str], fused_layer_name: str) None #
Fuse two linear layers and append them to the nn Module.
- Parameters:
- Raises:
ValueError – All linear layers must be of type nn.Linear and must have the same input dimension
- intel_npu_acceleration_library.optimizations.horizontal_fusion_linear(model: Module) Module #
Fuze horizontally two or more linear layers that share the same origin. This will increase NPU hw utilization.
- Parameters:
model (torch.nn.Module) – The original nn.Module
- Returns:
optimize nn.Module where parallel linear operations has been fused into a single bigger one
- Return type:
torch.nn.Module
intel_npu_acceleration_library.quantization module#
- intel_npu_acceleration_library.quantization.compress_to_i4(weights: Tensor) Tensor #
Compresses a given tensor to 4-bit representation.
- Parameters:
weights (torch.Tensor) – The input tensor to be compressed.
- Returns:
The compressed tensor with 4-bit representation.
- Return type:
torch.Tensor
- intel_npu_acceleration_library.quantization.quantize_fit(model: Module, weights_dtype: str, algorithm: str = 'RTN') Module #
Quantize a model with a given configuration.
- Parameters:
model (torch.nn.Module) – The model to quantize
weights_dtype (str) – The datatype for the weights
algorithm (str, optional) – The quantization algorithm. Defaults to “RTN”.
- Raises:
RuntimeError – Quantization error: unsupported datatype
- Returns:
The quantized model
- Return type:
torch.nn.Module
- intel_npu_acceleration_library.quantization.quantize_i4_model(model: Module, algorithm: str = 'RTN') Module #
Quantize a model to 4-bit representation.
- Parameters:
model (torch.nn.Module) – The model to quantize
algorithm (str, optional) – The quantization algorithm. Defaults to “RTN”.
- Returns:
The quantized model
- Return type:
torch.nn.Module
- intel_npu_acceleration_library.quantization.quantize_i8_model(model: Module, algorithm: str = 'RTN') Module #
Quantize a model to 8-bit representation.
- Parameters:
model (torch.nn.Module) – The model to quantize
algorithm (str, optional) – The quantization algorithm. Defaults to “RTN”.
- Returns:
The quantized model
- Return type:
torch.nn.Module
- intel_npu_acceleration_library.quantization.quantize_model(model: Module, dtype: NPUDtype) Module #
Quantize a model.
- Parameters:
model (torch.nn.Module) – The model to quantize
dtype (NPUDtype) – The desired datatype
- Raises:
RuntimeError – Quantization error: unsupported datatype
- Returns:
The quantized model
- Return type:
torch.nn.Module
- intel_npu_acceleration_library.quantization.quantize_tensor(weight: Tensor, min_max_range: Tuple[int, int] = (-128, 127)) Tuple[Tensor, Tensor] #
Quantize a fp16 tensor symmetrically.
Produces a quantize tensor (same shape, dtype == torch.int8) and a scale tensor (dtype == `torch.float16) The quantization equation is the following W = S * W_q
- Parameters:
weight (torch.Tensor) – The tensor to quantize
min_max_range (Tuple[int, int]) – The min and max range for the quantized tensor. Defaults to (-128, 127).
- Raises:
RuntimeError – Error in the quantization step
- Returns:
Quantized tensor and scale
- Return type:
Tuple[torch.Tensor, torch.Tensor]
Module contents#
- class intel_npu_acceleration_library.NPUAutoModel#
Bases:
object
NPU wrapper for AutoModel.
- Attrs:
from_pretrained: Load a pretrained model
- from_pretrained(config: ~intel_npu_acceleration_library.compiler.CompilerConfig, *, transformers_class: ~typing.Type | None = <class 'transformers.models.auto.modeling_auto.AutoModel'>, export=True, **kwargs: ~typing.Any) Module #
- class intel_npu_acceleration_library.NPUModel#
Bases:
object
Base NPU model class.
- static from_pretrained(model_name_or_path: str, config: CompilerConfig, transformers_class: Type | None = None, export=True, *args: Any, **kwargs: Any) Module #
Template for the from_pretrained static method.
- Parameters:
model_name_or_path (str) – model name or path
config (CompilerConfig) – compiler configuration
transformers_class (Optional[Type], optional) – base class to use. Must have a from_pretrained method. Defaults to None.
export (bool, optional) – enable the caching of the model. Defaults to True.
args (Any) – positional arguments
kwargs (Any) – keyword arguments
- Raises:
RuntimeError – Invalid class
AttributeError – Cannot export model with trust_remote_code=True
- Returns:
compiled mode
- Return type:
torch.nn.Module
- class intel_npu_acceleration_library.NPUModelForCausalLM#
Bases:
object
NPU wrapper for AutoModelForCausalLM.
- Attrs:
from_pretrained: Load a pretrained model
- from_pretrained(config: ~intel_npu_acceleration_library.compiler.CompilerConfig, *, transformers_class: ~typing.Type | None = <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, export=True, **kwargs: ~typing.Any) Module #
- class intel_npu_acceleration_library.NPUModelForSeq2SeqLM#
Bases:
object
NPU wrapper for AutoModelForSeq2SeqLM.
- Attrs:
from_pretrained: Load a pretrained model
- from_pretrained(config: ~intel_npu_acceleration_library.compiler.CompilerConfig, *, transformers_class: ~typing.Type | None = <class 'transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM'>, export=True, **kwargs: ~typing.Any) Module #
- intel_npu_acceleration_library.compile(model: Module, config: CompilerConfig) Module #
Compile a model for the NPU.
- Parameters:
model (torch.nn.Module) – a pytorch nn.Module to compile and optimize for the npu
config (CompilerConfig) – the compiler configuration
- Raises:
RuntimeError – invalid datatypes
- Returns:
compiled NPU nn.Module
- Return type:
torch.nn.Module