intel_npu_acceleration_library package#
Subpackages#
- intel_npu_acceleration_library.backend package
- Submodules
- intel_npu_acceleration_library.backend.base module
- intel_npu_acceleration_library.backend.factory module
- intel_npu_acceleration_library.backend.linear module
- intel_npu_acceleration_library.backend.matmul module
- intel_npu_acceleration_library.backend.mlp module
- intel_npu_acceleration_library.backend.qlinear module
- intel_npu_acceleration_library.backend.qmatmul module
- intel_npu_acceleration_library.backend.runtime module
- Module contents
- intel_npu_acceleration_library.nn package
Submodules#
intel_npu_acceleration_library.bindings module#
intel_npu_acceleration_library.compiler module#
- intel_npu_acceleration_library.compiler.apply_horizontal_fusion(model: Module, *args: Any, **kwargs: Any)#
Recursively apply the optimization function.
- Parameters:
model (torch.nn.Module) – original module
args (Any) – positional arguments
kwargs (Any) – keyword arguments
- intel_npu_acceleration_library.compiler.compile(model: Module, dtype: dtype = torch.float16, training: bool = False) Module #
Compile a model for the NPU.
- Parameters:
model (torch.nn.Module) – a pytorch nn.Module to compile and optimize for the npu
dtype (torch.dtype) – the model target datatype, default to torch.float16
training (bool) – enable training. Default disabled
- Raises:
RuntimeError – invalid datatypes
- Returns:
compiled NPU nn.Module
- Return type:
torch.nn.Module
- intel_npu_acceleration_library.compiler.lower_linear(model: Module, *args: Any, **kwargs: Any)#
Recursively apply the optimization function.
- Parameters:
model (torch.nn.Module) – original module
args (Any) – positional arguments
kwargs (Any) – keyword arguments
- intel_npu_acceleration_library.compiler.module_optimization(func: Callable) Module #
Optimize recursively a torch.nn.Module with a specific function.
The function func get called recursively to every module in the network.
- Parameters:
func (Callable) – optimization function
- Returns:
optimized module
- Return type:
torch.nn.Module
- intel_npu_acceleration_library.compiler.npu(gm: Module | GraphModule, example_inputs: List[Tensor]) Module | GraphModule #
Implement the custom torch 2.0 compile backend for the NPU.
- Parameters:
gm (Union[torch.nn.Module, torch.fx.GraphModule]) – The torch fx Module
example_inputs (List[torch.Tensor]) – A list of example inputs
- Returns:
The compiled model
- Return type:
Union[torch.nn.Module, torch.fx.GraphModule]
- intel_npu_acceleration_library.compiler.optimize_llama_attention(model: Module, *args: Any, **kwargs: Any)#
Recursively apply the optimization function.
- Parameters:
model (torch.nn.Module) – original module
args (Any) – positional arguments
kwargs (Any) – keyword arguments
intel_npu_acceleration_library.optimizations module#
- intel_npu_acceleration_library.optimizations.delattr_recursively(module: Module, target: str)#
Delete attribute recursively by name in a torch.nn.Module.
- Parameters:
module (nn.Module) – the nn.Module
target (str) – the attribute you want to delete
- intel_npu_acceleration_library.optimizations.fuse_linear_layers(model: Module, modules: Dict[str, Linear], targets: List[str], fused_layer_name: str) None #
Fuse two linear layers and append them to the nn Module.
- Parameters:
model (nn.Module) – Origianl nn.Module object
modules (Dict[nn.Linear]) – a dictiorany of node name: linear layer
targets (List[str]) – list of layer node names
fused_layer_name (str) – fused layer name
- Raises:
ValueError – All linear layers must be of type nn.Linear and must have the same input dimension
- intel_npu_acceleration_library.optimizations.horizontal_fusion_linear(model: Module) Module #
Fuze horizontally two or more linear layers that share the same origin. This will increase NPU hw utilization.
- Parameters:
model (torch.nn.Module) – The original nn.Module
- Returns:
optimize nn.Module where parallel linear operations has been fused into a single bigger one
- Return type:
torch.nn.Module
intel_npu_acceleration_library.quantization module#
- intel_npu_acceleration_library.quantization.quantize_tensor(weight: Tensor) Tuple[Tensor, Tensor] #
Quantize a fp16 tensor symmetrically.
Produces a quantize tensor (same shape, dtype == torch.int8) and a scale tensor (dtype == `torch.float16) The quantization equation is the following W = S * W_q
- Parameters:
weight (torch.Tensor) – The tensor to quantize
- Raises:
RuntimeError – Error in the quantization step
- Returns:
Quantized tensor and scale
- Return type:
Tuple[torch.Tensor, torch.Tensor]
Module contents#
- intel_npu_acceleration_library.compile(model: Module, dtype: dtype = torch.float16, training: bool = False) Module #
Compile a model for the NPU.
- Parameters:
model (torch.nn.Module) – a pytorch nn.Module to compile and optimize for the npu
dtype (torch.dtype) – the model target datatype, default to torch.float16
training (bool) – enable training. Default disabled
- Raises:
RuntimeError – invalid datatypes
- Returns:
compiled NPU nn.Module
- Return type:
torch.nn.Module