intel_npu_acceleration_library.functional package#

Submodules#

intel_npu_acceleration_library.functional.scaled_dot_product_attention module#

intel_npu_acceleration_library.functional.scaled_dot_product_attention.scaled_dot_product_attention(query: Tensor, key: Tensor, value: Tensor, attn_mask: Tensor | None = None, dropout_p: float = 0.0, is_causal: bool = False, scale: float | None = None) Tensor#

Execute SDPA kernel.

Parameters:
  • query (torch.Tensor) – query tensor

  • key (torch.Tensor) – key tensor

  • value (torch.Tensor) – value tensor

  • attn_mask (torch.Tensor, optional) – attention mask tensor. Defaults to None.

  • dropout_p (float, optional) – optional dropout. Defaults to 0.0.

  • is_causal (bool, optional) – enable causal mask. Defaults to False.

  • scale (Optional[float], optional) – custom scale. Defaults to None.

Raises:

RuntimeError – _description_

Returns:

_description_

Return type:

torch.Tensor

Module contents#

intel_npu_acceleration_library.functional.scaled_dot_product_attention(query: Tensor, key: Tensor, value: Tensor, attn_mask: Tensor | None = None, dropout_p: float = 0.0, is_causal: bool = False, scale: float | None = None) Tensor#

Execute SDPA kernel.

Parameters:
  • query (torch.Tensor) – query tensor

  • key (torch.Tensor) – key tensor

  • value (torch.Tensor) – value tensor

  • attn_mask (torch.Tensor, optional) – attention mask tensor. Defaults to None.

  • dropout_p (float, optional) – optional dropout. Defaults to 0.0.

  • is_causal (bool, optional) – enable causal mask. Defaults to False.

  • scale (Optional[float], optional) – custom scale. Defaults to None.

Raises:

RuntimeError – _description_

Returns:

_description_

Return type:

torch.Tensor