intel_npu_acceleration_library package#

Subpackages#

Submodules#

intel_npu_acceleration_library.bindings module#

intel_npu_acceleration_library.compiler module#

intel_npu_acceleration_library.compiler.apply_horizontal_fusion(model: Module, *args: Any, **kwargs: Any)#

Recursively apply the optimization function.

Parameters:
  • model (torch.nn.Module) – original module

  • args (Any) – positional arguments

  • kwargs (Any) – keyword arguments

intel_npu_acceleration_library.compiler.compile(model: Module, dtype: dtype = torch.float16, training: bool = False) Module#

Compile a model for the NPU.

Parameters:
  • model (torch.nn.Module) – a pytorch nn.Module to compile and optimize for the npu

  • dtype (torch.dtype) – the model target datatype, default to torch.float16

  • training (bool) – enable training. Default disabled

Raises:

RuntimeError – invalid datatypes

Returns:

compiled NPU nn.Module

Return type:

torch.nn.Module

intel_npu_acceleration_library.compiler.lower_linear(model: Module, *args: Any, **kwargs: Any)#

Recursively apply the optimization function.

Parameters:
  • model (torch.nn.Module) – original module

  • args (Any) – positional arguments

  • kwargs (Any) – keyword arguments

intel_npu_acceleration_library.compiler.module_optimization(func: Callable) Module#

Optimize recursively a torch.nn.Module with a specific function.

The function func get called recursively to every module in the network.

Parameters:

func (Callable) – optimization function

Returns:

optimized module

Return type:

torch.nn.Module

intel_npu_acceleration_library.compiler.npu(gm: Module | GraphModule, example_inputs: List[Tensor]) Module | GraphModule#

Implement the custom torch 2.0 compile backend for the NPU.

Parameters:
  • gm (Union[torch.nn.Module, torch.fx.GraphModule]) – The torch fx Module

  • example_inputs (List[torch.Tensor]) – A list of example inputs

Returns:

The compiled model

Return type:

Union[torch.nn.Module, torch.fx.GraphModule]

intel_npu_acceleration_library.compiler.optimize_llama_attention(model: Module, *args: Any, **kwargs: Any)#

Recursively apply the optimization function.

Parameters:
  • model (torch.nn.Module) – original module

  • args (Any) – positional arguments

  • kwargs (Any) – keyword arguments

intel_npu_acceleration_library.optimizations module#

intel_npu_acceleration_library.optimizations.delattr_recursively(module: Module, target: str)#

Delete attribute recursively by name in a torch.nn.Module.

Parameters:
  • module (nn.Module) – the nn.Module

  • target (str) – the attribute you want to delete

intel_npu_acceleration_library.optimizations.fuse_linear_layers(model: Module, modules: Dict[str, Linear], targets: List[str], fused_layer_name: str) None#

Fuse two linear layers and append them to the nn Module.

Parameters:
  • model (nn.Module) – Origianl nn.Module object

  • modules (Dict[nn.Linear]) – a dictiorany of node name: linear layer

  • targets (List[str]) – list of layer node names

  • fused_layer_name (str) – fused layer name

Raises:

ValueError – All linear layers must be of type nn.Linear and must have the same input dimension

intel_npu_acceleration_library.optimizations.horizontal_fusion_linear(model: Module) Module#

Fuze horizontally two or more linear layers that share the same origin. This will increase NPU hw utilization.

Parameters:

model (torch.nn.Module) – The original nn.Module

Returns:

optimize nn.Module where parallel linear operations has been fused into a single bigger one

Return type:

torch.nn.Module

intel_npu_acceleration_library.quantization module#

intel_npu_acceleration_library.quantization.quantize_tensor(weight: Tensor) Tuple[Tensor, Tensor]#

Quantize a fp16 tensor symmetrically.

Produces a quantize tensor (same shape, dtype == torch.int8) and a scale tensor (dtype == `torch.float16) The quantization equation is the following W = S * W_q

Parameters:

weight (torch.Tensor) – The tensor to quantize

Raises:

RuntimeError – Error in the quantization step

Returns:

Quantized tensor and scale

Return type:

Tuple[torch.Tensor, torch.Tensor]

Module contents#

intel_npu_acceleration_library.compile(model: Module, dtype: dtype = torch.float16, training: bool = False) Module#

Compile a model for the NPU.

Parameters:
  • model (torch.nn.Module) – a pytorch nn.Module to compile and optimize for the npu

  • dtype (torch.dtype) – the model target datatype, default to torch.float16

  • training (bool) – enable training. Default disabled

Raises:

RuntimeError – invalid datatypes

Returns:

compiled NPU nn.Module

Return type:

torch.nn.Module