intel_npu_acceleration_library package#
Subpackages#
- intel_npu_acceleration_library.backend package
- Submodules
- intel_npu_acceleration_library.backend.base module
- intel_npu_acceleration_library.backend.factory module
NNFactoryNNFactory.avg_pooling()NNFactory.compile()NNFactory.concat()NNFactory.constant()NNFactory.convolution()NNFactory.get_backend_dtype()NNFactory.get_tensor_dtype()NNFactory.get_tensor_recursively()NNFactory.get_tensor_shape()NNFactory.linear()NNFactory.matmul()NNFactory.max_pooling()NNFactory.normL2()NNFactory.parameter()NNFactory.power()NNFactory.reduce_max()NNFactory.reduce_mean()NNFactory.reduce_min()NNFactory.reduce_prod()NNFactory.reduce_sum()NNFactory.reshape()NNFactory.return_tensor()NNFactory.run()NNFactory.set_input_tensor()NNFactory.slice()NNFactory.to()NNFactory.transpose()NNFactory.unsqueeze()
- intel_npu_acceleration_library.backend.linear module
- intel_npu_acceleration_library.backend.matmul module
- intel_npu_acceleration_library.backend.mlp module
- intel_npu_acceleration_library.backend.qlinear module
- intel_npu_acceleration_library.backend.qmatmul module
- intel_npu_acceleration_library.backend.runtime module
- Module contents
ConvolutionLinearMLPMatMulNNFactoryNNFactory.avg_pooling()NNFactory.compile()NNFactory.concat()NNFactory.constant()NNFactory.convolution()NNFactory.get_backend_dtype()NNFactory.get_tensor_dtype()NNFactory.get_tensor_recursively()NNFactory.get_tensor_shape()NNFactory.linear()NNFactory.matmul()NNFactory.max_pooling()NNFactory.normL2()NNFactory.parameter()NNFactory.power()NNFactory.reduce_max()NNFactory.reduce_mean()NNFactory.reduce_min()NNFactory.reduce_prod()NNFactory.reduce_sum()NNFactory.reshape()NNFactory.return_tensor()NNFactory.run()NNFactory.set_input_tensor()NNFactory.slice()NNFactory.to()NNFactory.transpose()NNFactory.unsqueeze()
QLinearQMatMulSDPASimpleSDPATensorTensor.__add__()Tensor.__sub__()Tensor.__mul__()Tensor.__truediv__()Tensor.__neg__()Tensor.__repr__()Tensor.__str__()Tensor.__len__()Tensor.T()Tensor.squeeze()Tensor.unsqueeze()Tensor.__matmul__()Tensor.acos()Tensor.asin()Tensor.atan()Tensor.acosh()Tensor.asinh()Tensor.atanh()Tensor.cosh()Tensor.sinh()Tensor.tanh()Tensor.cos()Tensor.sin()Tensor.tan()Tensor.ceiling()Tensor.clamp()Tensor.elu()Tensor.erf()Tensor.exp()Tensor.floor()Tensor.grn()Tensor.hsigmoid()Tensor.hswish()Tensor.log()Tensor.mish()Tensor.relu()Tensor.round()Tensor.sigmoid()Tensor.sign()Tensor.softmax()Tensor.softplus()Tensor.sqrt()Tensor.max()Tensor.mean()Tensor.min()Tensor.prod()Tensor.sum()Tensor.TTensor.acos()Tensor.acosh()Tensor.asin()Tensor.asinh()Tensor.atan()Tensor.atanh()Tensor.ceiling()Tensor.chunk()Tensor.clamp()Tensor.cos()Tensor.cosh()Tensor.dim()Tensor.dtypeTensor.elu()Tensor.erf()Tensor.exp()Tensor.factoryTensor.flatten()Tensor.floor()Tensor.grn()Tensor.hsigmoid()Tensor.hswish()Tensor.log()Tensor.max()Tensor.mean()Tensor.min()Tensor.mish()Tensor.nodeTensor.permute()Tensor.prod()Tensor.relu()Tensor.reshape()Tensor.round()Tensor.shapeTensor.sigmoid()Tensor.sign()Tensor.sin()Tensor.sinh()Tensor.size()Tensor.softmax()Tensor.softplus()Tensor.sqrt()Tensor.squeeze()Tensor.sum()Tensor.tan()Tensor.tanh()Tensor.to()Tensor.transpose()Tensor.type()Tensor.unsqueeze()Tensor.view()
clear_cache()get_driver_version()npu_available()run_factory()run_matmul()
- intel_npu_acceleration_library.nn package
- intel_npu_acceleration_library.functional package
Submodules#
intel_npu_acceleration_library.bindings module#
intel_npu_acceleration_library.compiler module#
- class intel_npu_acceleration_library.compiler.CompilerConfig(use_to: bool = False, dtype: dtype | NPUDtype = torch.float16, training: bool = False)#
Bases:
objectConfiguration class to store the compilation configuration of a model for the NPU.
- intel_npu_acceleration_library.compiler.apply_general_optimizations(model: Module)#
Apply general optimizations to a torch.nn.Module.
- Parameters:
model (torch.nn.Module) – a pytorch nn.Module to compile and optimize for the npu
- intel_npu_acceleration_library.compiler.apply_horizontal_fusion(model: Module, *args: Any, **kwargs: Any)#
Recursively apply the optimization function.
- Parameters:
model (torch.nn.Module) – original module
args (Any) – positional arguments
kwargs (Any) – keyword arguments
- intel_npu_acceleration_library.compiler.compile(model: Module, config: CompilerConfig) Module#
Compile a model for the NPU.
- Parameters:
model (torch.nn.Module) – a pytorch nn.Module to compile and optimize for the npu
config (CompilerConfig) – the compiler configuration
- Raises:
RuntimeError – invalid datatypes
- Returns:
compiled NPU nn.Module
- Return type:
torch.nn.Module
- intel_npu_acceleration_library.compiler.create_npu_kernels(model: Module)#
Create NPU kernels.
- Parameters:
model (torch.nn.Module) – a pytorch nn.Module to compile and optimize for the npu
- intel_npu_acceleration_library.compiler.forward(self, input)#
Override forward method for WeightOnlyLinear class.
- Parameters:
input – The input tensor.
- Returns:
The output tensor.
- Return type:
torch.Tensor
- intel_npu_acceleration_library.compiler.lower_linear(model: Module, *args: Any, **kwargs: Any)#
Recursively apply the optimization function.
- Parameters:
model (torch.nn.Module) – original module
args (Any) – positional arguments
kwargs (Any) – keyword arguments
- intel_npu_acceleration_library.compiler.module_optimization(func: Callable) Module#
Optimize recursively a torch.nn.Module with a specific function.
The function func get called recursively to every module in the network.
- Parameters:
func (Callable) – optimization function
- Returns:
optimized module
- Return type:
torch.nn.Module
- intel_npu_acceleration_library.compiler.npu(gm: Module | GraphModule, example_inputs: List[Tensor]) Module | GraphModule#
Implement the custom torch 2.0 compile backend for the NPU.
- Parameters:
gm (Union[torch.nn.Module, torch.fx.GraphModule]) – The torch fx Module
example_inputs (List[torch.Tensor]) – A list of example inputs
- Returns:
The compiled model
- Return type:
Union[torch.nn.Module, torch.fx.GraphModule]
- intel_npu_acceleration_library.compiler.optimize_llama_attention(model: Module, *args: Any, **kwargs: Any)#
Recursively apply the optimization function.
- Parameters:
model (torch.nn.Module) – original module
args (Any) – positional arguments
kwargs (Any) – keyword arguments
- intel_npu_acceleration_library.compiler.weights_quantization(model: Module, *args: Any, **kwargs: Any)#
Recursively apply the optimization function.
- Parameters:
model (torch.nn.Module) – original module
args (Any) – positional arguments
kwargs (Any) – keyword arguments
intel_npu_acceleration_library.optimizations module#
- intel_npu_acceleration_library.optimizations.delattr_recursively(module: Module, target: str)#
Delete attribute recursively by name in a torch.nn.Module.
- Parameters:
module (nn.Module) – the nn.Module
target (str) – the attribute you want to delete
- intel_npu_acceleration_library.optimizations.fuse_linear_layers(model: Module, modules: Dict[str, Linear], targets: List[str], fused_layer_name: str) None#
Fuse two linear layers and append them to the nn Module.
- Parameters:
- Raises:
ValueError – All linear layers must be of type nn.Linear and must have the same input dimension
- intel_npu_acceleration_library.optimizations.horizontal_fusion_linear(model: Module) Module#
Fuze horizontally two or more linear layers that share the same origin. This will increase NPU hw utilization.
- Parameters:
model (torch.nn.Module) – The original nn.Module
- Returns:
optimize nn.Module where parallel linear operations has been fused into a single bigger one
- Return type:
torch.nn.Module
intel_npu_acceleration_library.quantization module#
- intel_npu_acceleration_library.quantization.compress_to_i4(weights: Tensor) Tensor#
Compresses a given tensor to 4-bit representation.
- Parameters:
weights (torch.Tensor) – The input tensor to be compressed.
- Returns:
The compressed tensor with 4-bit representation.
- Return type:
torch.Tensor
- intel_npu_acceleration_library.quantization.quantize_fit(model: Module, weights_dtype: str, algorithm: str = 'RTN') Module#
Quantize a model with a given configuration.
- Parameters:
model (torch.nn.Module) – The model to quantize
weights_dtype (str) – The datatype for the weights
algorithm (str, optional) – The quantization algorithm. Defaults to “RTN”.
- Raises:
RuntimeError – Quantization error: unsupported datatype
- Returns:
The quantized model
- Return type:
torch.nn.Module
- intel_npu_acceleration_library.quantization.quantize_i4_model(model: Module, algorithm: str = 'RTN') Module#
Quantize a model to 4-bit representation.
- Parameters:
model (torch.nn.Module) – The model to quantize
algorithm (str, optional) – The quantization algorithm. Defaults to “RTN”.
- Returns:
The quantized model
- Return type:
torch.nn.Module
- intel_npu_acceleration_library.quantization.quantize_i8_model(model: Module, algorithm: str = 'RTN') Module#
Quantize a model to 8-bit representation.
- Parameters:
model (torch.nn.Module) – The model to quantize
algorithm (str, optional) – The quantization algorithm. Defaults to “RTN”.
- Returns:
The quantized model
- Return type:
torch.nn.Module
- intel_npu_acceleration_library.quantization.quantize_model(model: Module, dtype: NPUDtype) Module#
Quantize a model.
- Parameters:
model (torch.nn.Module) – The model to quantize
dtype (NPUDtype) – The desired datatype
- Raises:
RuntimeError – Quantization error: unsupported datatype
- Returns:
The quantized model
- Return type:
torch.nn.Module
- intel_npu_acceleration_library.quantization.quantize_tensor(weight: Tensor, min_max_range: Tuple[int, int] = (-128, 127)) Tuple[Tensor, Tensor]#
Quantize a fp16 tensor symmetrically.
Produces a quantize tensor (same shape, dtype == torch.int8) and a scale tensor (dtype == `torch.float16) The quantization equation is the following W = S * W_q
- Parameters:
weight (torch.Tensor) – The tensor to quantize
min_max_range (Tuple[int, int]) – The min and max range for the quantized tensor. Defaults to (-128, 127).
- Raises:
RuntimeError – Error in the quantization step
- Returns:
Quantized tensor and scale
- Return type:
Tuple[torch.Tensor, torch.Tensor]
Module contents#
- class intel_npu_acceleration_library.NPUAutoModel#
Bases:
objectNPU wrapper for AutoModel.
- Attrs:
from_pretrained: Load a pretrained model
- from_pretrained(config: ~intel_npu_acceleration_library.compiler.CompilerConfig, *, transformers_class: ~typing.Type | None = <class 'transformers.models.auto.modeling_auto.AutoModel'>, export=True, **kwargs: ~typing.Any) Module#
- class intel_npu_acceleration_library.NPUModel#
Bases:
objectBase NPU model class.
- static from_pretrained(model_name_or_path: str, config: CompilerConfig, transformers_class: Type | None = None, export=True, *args: Any, **kwargs: Any) Module#
Template for the from_pretrained static method.
- Parameters:
model_name_or_path (str) – model name or path
config (CompilerConfig) – compiler configuration
transformers_class (Optional[Type], optional) – base class to use. Must have a from_pretrained method. Defaults to None.
export (bool, optional) – enable the caching of the model. Defaults to True.
args (Any) – positional arguments
kwargs (Any) – keyword arguments
- Raises:
RuntimeError – Invalid class
AttributeError – Cannot export model with trust_remote_code=True
- Returns:
compiled mode
- Return type:
torch.nn.Module
- class intel_npu_acceleration_library.NPUModelForCausalLM#
Bases:
objectNPU wrapper for AutoModelForCausalLM.
- Attrs:
from_pretrained: Load a pretrained model
- from_pretrained(config: ~intel_npu_acceleration_library.compiler.CompilerConfig, *, transformers_class: ~typing.Type | None = <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, export=True, **kwargs: ~typing.Any) Module#
- class intel_npu_acceleration_library.NPUModelForSeq2SeqLM#
Bases:
objectNPU wrapper for AutoModelForSeq2SeqLM.
- Attrs:
from_pretrained: Load a pretrained model
- from_pretrained(config: ~intel_npu_acceleration_library.compiler.CompilerConfig, *, transformers_class: ~typing.Type | None = <class 'transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM'>, export=True, **kwargs: ~typing.Any) Module#
- intel_npu_acceleration_library.compile(model: Module, config: CompilerConfig) Module#
Compile a model for the NPU.
- Parameters:
model (torch.nn.Module) – a pytorch nn.Module to compile and optimize for the npu
config (CompilerConfig) – the compiler configuration
- Raises:
RuntimeError – invalid datatypes
- Returns:
compiled NPU nn.Module
- Return type:
torch.nn.Module