:orphan: :py:mod:`llama_flash_attn_monkey_patch` ======================================= .. py:module:: llama_flash_attn_monkey_patch Module Contents --------------- Functions ~~~~~~~~~ .. autoapisummary:: llama_flash_attn_monkey_patch.forward .. py:function:: forward(self, hidden_states: torch.Tensor, attention_mask: Optional[torch.Tensor] = None, position_ids: Optional[torch.Tensor] = None, past_key_value: Optional[Tuple[torch.Tensor]] = None, output_attentions: bool = False, use_cache: bool = False) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]] Input shape: Batch x Time x Channel attention_mask: [bsz, q_len]