intel_extension_for_transformers.transformers.modeling.modeling_roberta_dynamic

PyTorch RoBERTa model.

Classes

`RobertaEmbeddings`	Same as BertEmbeddings with a tiny tweak for positional embeddings indexing.
`RobertaSelfAttention`	Roberta self attention.
`RobertaSelfOutput`	Roberta self output.
`RobertaAttention`	Roberta attenion.
`RobertaIntermediate`	Roberta intermediate.
`RobertaOutput`	Roberta output.
`RobertaLayer`	Basic layer of roberta.
`RobertaEncoder`	The encoder for Roberata.
`RobertaPooler`	Roberta pooler.
`RobertaPreTrainedModel`	Roberta pretrained model.
`RobertaModel`	Basic roberta model.
`RobertaForCausalLM`	Roberta for causal language model task.
`RobertaForMaskedLM`	Roberta for masked language model task.
`RobertaLMHead`	Roberta Head for masked language modeling.
`RobertaForSequenceClassification`	Roberta for sequence classification task.
`RobertaForMultipleChoice`	Roberta for multiple choice task.
`RobertaForTokenClassification`	Roberta for token classification task.
`RobertaClassificationHead`	Head for sentence-level classification tasks.
`RobertaForQuestionAnswering`	Roberta model for quanstion answering task.

Functions

`create_position_ids_from_input_ids`(input_ids, padding_idx)	Replace non-padding symbols with their position numbers.
`expand_gather`(input, dim, index)	Expand gather.

Module Contents

class intel_extension_for_transformers.transformers.modeling.modeling_roberta_dynamic.RobertaEmbeddings(config)[source]

Same as BertEmbeddings with a tiny tweak for positional embeddings indexing.

forward(input_ids=None, token_type_ids=None, position_ids=None, inputs_embeds=None, past_key_values_length=0)[source]: The main entry point for the class.

create_position_ids_from_inputs_embeds(inputs_embeds)[source]

We are provided embeddings directly.

We cannot infer which are padded so just generate sequential position ids.

Parameters:: inputs_embeds – torch.Tensor

Returns: torch.Tensor

class intel_extension_for_transformers.transformers.modeling.modeling_roberta_dynamic.RobertaSelfAttention(config, position_embedding_type=None)[source]

Roberta self attention.

transpose_for_scores(x: torch.Tensor) → torch.Tensor[source]: Transpose for scores.

forward(hidden_states: torch.Tensor, attention_mask: torch.FloatTensor | None = None, head_mask: torch.FloatTensor | None = None, encoder_hidden_states: torch.FloatTensor | None = None, encoder_attention_mask: torch.FloatTensor | None = None, past_key_value: Tuple[Tuple[torch.FloatTensor]] | None = None, output_attentions: bool | None = False) → Tuple[torch.Tensor][source]: The main entry point for the class.

class intel_extension_for_transformers.transformers.modeling.modeling_roberta_dynamic.RobertaSelfOutput(config)[source]

Roberta self output.

forward(hidden_states: torch.Tensor, input_tensor: torch.Tensor) → torch.Tensor[source]: The main entry point for the class.

class intel_extension_for_transformers.transformers.modeling.modeling_roberta_dynamic.RobertaAttention(config, position_embedding_type=None)[source]

Roberta attenion.

prune_heads(heads)[source]: Prune heads.

forward(hidden_states: torch.Tensor, attention_mask: torch.FloatTensor | None = None, head_mask: torch.FloatTensor | None = None, encoder_hidden_states: torch.FloatTensor | None = None, encoder_attention_mask: torch.FloatTensor | None = None, past_key_value: Tuple[Tuple[torch.FloatTensor]] | None = None, output_attentions: bool | None = False) → Tuple[torch.Tensor][source]: The main entry point for the class.

class intel_extension_for_transformers.transformers.modeling.modeling_roberta_dynamic.RobertaIntermediate(config)[source]

Roberta intermediate.

forward(hidden_states: torch.Tensor) → torch.Tensor[source]: The main entry point for the class.

class intel_extension_for_transformers.transformers.modeling.modeling_roberta_dynamic.RobertaOutput(config)[source]

Roberta output.

forward(hidden_states: torch.Tensor, input_tensor: torch.Tensor) → torch.Tensor[source]: The main entry point for the class.

class intel_extension_for_transformers.transformers.modeling.modeling_roberta_dynamic.RobertaLayer(config)[source]

Basic layer of roberta.

forward(hidden_states: torch.Tensor, attention_mask: torch.FloatTensor | None = None, head_mask: torch.FloatTensor | None = None, encoder_hidden_states: torch.FloatTensor | None = None, encoder_attention_mask: torch.FloatTensor | None = None, past_key_value: Tuple[Tuple[torch.FloatTensor]] | None = None, output_attentions: bool | None = False, output_length=None, always_keep_cls_token: bool | None = True) → Tuple[torch.Tensor][source]: The main entry point for the class.

feed_forward_chunk(attention_output)[source]: Feed forward attention output.

class intel_extension_for_transformers.transformers.modeling.modeling_roberta_dynamic.RobertaEncoder(config)[source]

The encoder for Roberata.

forward(hidden_states: torch.Tensor, attention_mask: torch.FloatTensor | None = None, head_mask: torch.FloatTensor | None = None, encoder_hidden_states: torch.FloatTensor | None = None, encoder_attention_mask: torch.FloatTensor | None = None, past_key_values: Tuple[Tuple[torch.FloatTensor]] | None = None, use_cache: bool | None = None, output_attentions: bool | None = False, output_hidden_states: bool | None = False, return_dict: bool | None = True, layer_config=None, length_config=None, always_keep_cls_token=True) → Tuple[torch.Tensor] | transformers.modeling_outputs.BaseModelOutputWithPastAndCrossAttentions[source]: The main entry point for the class.

class intel_extension_for_transformers.transformers.modeling.modeling_roberta_dynamic.RobertaPooler(config)[source]

Roberta pooler.

forward(hidden_states: torch.Tensor) → torch.Tensor[source]: The main entry point for the class.

class intel_extension_for_transformers.transformers.modeling.modeling_roberta_dynamic.RobertaPreTrainedModel[source]

Roberta pretrained model.

An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained models.

update_keys_to_ignore(config, del_keys_to_ignore)[source]: Remove some keys from ignore list.

class intel_extension_for_transformers.transformers.modeling.modeling_roberta_dynamic.RobertaModel(config, add_pooling_layer=True)[source]

Basic roberta model.

The model can behave as an encoder (with only self-attention) as well as a decoder, in which case a layer of cross-attention is added between the self-attention layers, following the architecture described in *Attention is all you need*_ by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin.

To behave as an decoder the model needs to be initialized with the is_decoder argument of the configuration set to True. To be used in a Seq2Seq model, the model needs to initialized with both is_decoder argument and add_cross_attention set to True; an encoder_hidden_states is then expected as an input to the forward pass.

get_input_embeddings()[source]: Getter of input embeddings.

set_input_embeddings(value)[source]: Setter of input embeddings.

set_length_config(length_config)[source]: Setter of length config.

set_output_attentions(value)[source]: Setter of output attentions.

forward(input_ids: torch.Tensor | None = None, attention_mask: torch.Tensor | None = None, token_type_ids: torch.Tensor | None = None, position_ids: torch.Tensor | None = None, head_mask: torch.Tensor | None = None, inputs_embeds: torch.Tensor | None = None, encoder_hidden_states: torch.Tensor | None = None, encoder_attention_mask: torch.Tensor | None = None, past_key_values: List[torch.FloatTensor] | None = None, use_cache: bool | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None, layer_config=None, length_config=None, always_keep_cls_token=True) → Tuple[torch.Tensor] | transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions[source]

The main entry point for the class.

encoder_hidden_states (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional):

Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if the model is configured as a decoder.

encoder_attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional):

Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in the cross-attention if the model is configured as a decoder. Mask values selected in [0, 1]:

1 for tokens that are not masked,
0 for tokens that are masked.

past_key_values (tuple(tuple(torch.FloatTensor)) of length config.n_layers with each tuple having 4 tensors

of shape (batch_size, num_heads, sequence_length - 1, embed_size_per_head)):

Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding.

If past_key_values are used, the user can optionally input only the last decoder_input_ids (those that don’t have their past key value states given to this model) of shape (batch_size, 1) instead of all decoder_input_ids of shape (batch_size, sequence_length).

use_cache (bool, optional):

If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

class intel_extension_for_transformers.transformers.modeling.modeling_roberta_dynamic.RobertaForCausalLM(config)[source]

Roberta for causal language model task.

get_output_embeddings()[source]: Getter of output embeddings.

set_output_embeddings(new_embeddings)[source]: Setter of output embeddings.

forward(input_ids: torch.LongTensor | None = None, attention_mask: torch.FloatTensor | None = None, token_type_ids: torch.LongTensor | None = None, position_ids: torch.LongTensor | None = None, head_mask: torch.FloatTensor | None = None, inputs_embeds: torch.FloatTensor | None = None, encoder_hidden_states: torch.FloatTensor | None = None, encoder_attention_mask: torch.FloatTensor | None = None, labels: torch.LongTensor | None = None, past_key_values: Tuple[Tuple[torch.FloatTensor]] = None, use_cache: bool | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None) → Tuple[torch.Tensor] | transformers.modeling_outputs.CausalLMOutputWithCrossAttentions[source]

The main entry point for the class.

encoder_hidden_states (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional):

Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if the model is configured as a decoder.

encoder_attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional):

Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in the cross-attention if the model is configured as a decoder. Mask values selected in [0, 1]:

1 for tokens that are not masked,
0 for tokens that are masked.

labels (torch.LongTensor of shape (batch_size, sequence_length), optional):

Labels for computing the left-to-right language modeling loss (next word prediction). Indices should be in [-100, 0, …, config.vocab_size] (see input_ids docstring) Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, …, config.vocab_size]

past_key_values (tuple(tuple(torch.FloatTensor)) of length config.n_layers with each tuple having 4

tensors of shape (batch_size, num_heads, sequence_length - 1, embed_size_per_head)): Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding. If past_key_values are used, the user can optionally input only the last decoder_input_ids (those that don’t have their past key value states given to this model) of shape (batch_size, 1) instead of all decoder_input_ids of shape (batch_size, sequence_length).

use_cache (bool, optional):

If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

Example

```python >>> from transformers import RobertaTokenizer, RobertaForCausalLM, RobertaConfig >>> import torch

>>> tokenizer = RobertaTokenizer.from_pretrained("roberta-base")
>>> config = RobertaConfig.from_pretrained("roberta-base")
>>> config.is_decoder = True
>>> model = RobertaForCausalLM.from_pretrained("roberta-base", config=config)

>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> outputs = model(**inputs)

>>> prediction_logits = outputs.logits
```

Returns:: CausalLMOutputWithCrossAttentions.

prepare_inputs_for_generation(input_ids, past=None, attention_mask=None, **model_kwargs)[source]: Prepare inputs for generation.

class intel_extension_for_transformers.transformers.modeling.modeling_roberta_dynamic.RobertaForMaskedLM(config)[source]

Roberta for masked language model task.

get_output_embeddings()[source]: Getter of output embeddings.

set_output_embeddings(new_embeddings)[source]: Setter of output embeddings.

forward(input_ids: torch.LongTensor | None = None, attention_mask: torch.FloatTensor | None = None, token_type_ids: torch.LongTensor | None = None, position_ids: torch.LongTensor | None = None, head_mask: torch.FloatTensor | None = None, inputs_embeds: torch.FloatTensor | None = None, encoder_hidden_states: torch.FloatTensor | None = None, encoder_attention_mask: torch.FloatTensor | None = None, labels: torch.LongTensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None) → Tuple[torch.Tensor] | transformers.modeling_outputs.MaskedLMOutput[source]

The main entry point for the class.

labels (torch.LongTensor of shape (batch_size, sequence_length), optional):: Labels for computing the masked language modeling loss. Indices should be in [-100, 0, …, config.vocab_size] (see input_ids docstring) Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, …, config.vocab_size]
kwargs (Dict[str, any], optional, defaults to {}):: Used to hide legacy arguments that have been deprecated.

class intel_extension_for_transformers.transformers.modeling.modeling_roberta_dynamic.RobertaLMHead(config)[source]

Roberta Head for masked language modeling.

forward(features, **kwargs)[source]: The main entry point for the class.

class intel_extension_for_transformers.transformers.modeling.modeling_roberta_dynamic.RobertaForSequenceClassification(config)[source]

Roberta for sequence classification task.

forward(input_ids: torch.LongTensor | None = None, attention_mask: torch.FloatTensor | None = None, token_type_ids: torch.LongTensor | None = None, position_ids: torch.LongTensor | None = None, head_mask: torch.FloatTensor | None = None, inputs_embeds: torch.FloatTensor | None = None, labels: torch.LongTensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None, layer_config=None, length_config=None, always_keep_cls_token=True) → Tuple[torch.Tensor] | transformers.modeling_outputs.SequenceClassifierOutput[source]

The main entry point for the class.

labels (torch.LongTensor of shape (batch_size,), optional):: Labels for computing the sequence classification/regression loss. Indices should be in [0, …, config.num_labels - 1]. If config.num_labels == 1 a regression loss is computed (Mean-Square loss), If config.num_labels > 1 a classification loss is computed (Cross-Entropy).

class intel_extension_for_transformers.transformers.modeling.modeling_roberta_dynamic.RobertaForMultipleChoice(config)[source]

Roberta for multiple choice task.

forward(input_ids: torch.LongTensor | None = None, token_type_ids: torch.LongTensor | None = None, attention_mask: torch.FloatTensor | None = None, labels: torch.LongTensor | None = None, position_ids: torch.LongTensor | None = None, head_mask: torch.FloatTensor | None = None, inputs_embeds: torch.FloatTensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None) → Tuple[torch.Tensor] | transformers.modeling_outputs.MultipleChoiceModelOutput[source]

The main entry point for the class.

labels (torch.LongTensor of shape (batch_size,), optional):: Labels for computing the multiple choice classification loss. Indices should be in [0, …, num_choices-1] where num_choices is the size of the second dimension of the input tensors. (See input_ids above)

class intel_extension_for_transformers.transformers.modeling.modeling_roberta_dynamic.RobertaForTokenClassification(config)[source]

Roberta for token classification task.

forward(input_ids: torch.LongTensor | None = None, attention_mask: torch.FloatTensor | None = None, token_type_ids: torch.LongTensor | None = None, position_ids: torch.LongTensor | None = None, head_mask: torch.FloatTensor | None = None, inputs_embeds: torch.FloatTensor | None = None, labels: torch.LongTensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None) → Tuple[torch.Tensor] | transformers.modeling_outputs.TokenClassifierOutput[source]

The main entry point for the class.

labels (torch.LongTensor of shape (batch_size, sequence_length), optional):: Labels for computing the token classification loss. Indices should be in [0, …, config.num_labels - 1].

class intel_extension_for_transformers.transformers.modeling.modeling_roberta_dynamic.RobertaClassificationHead(config)[source]

Head for sentence-level classification tasks.

forward(features, **kwargs)[source]: The main entry point for the class.

class intel_extension_for_transformers.transformers.modeling.modeling_roberta_dynamic.RobertaForQuestionAnswering(config)[source]

Roberta model for quanstion answering task.

forward(input_ids: torch.LongTensor | None = None, attention_mask: torch.FloatTensor | None = None, token_type_ids: torch.LongTensor | None = None, position_ids: torch.LongTensor | None = None, head_mask: torch.FloatTensor | None = None, inputs_embeds: torch.FloatTensor | None = None, start_positions: torch.LongTensor | None = None, end_positions: torch.LongTensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None, layer_config=None, length_config=None, always_keep_cls_token=False) → Tuple[torch.Tensor] | transformers.modeling_outputs.QuestionAnsweringModelOutput[source]

The main entry point for the class.

start_positions (torch.LongTensor of shape (batch_size,), optional):: Labels for position (index) of the start of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.
end_positions (torch.LongTensor of shape (batch_size,), optional):: Labels for position (index) of the end of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

intel_extension_for_transformers.transformers.modeling.modeling_roberta_dynamic.create_position_ids_from_input_ids(input_ids, padding_idx, past_key_values_length=0)[source]

Replace non-padding symbols with their position numbers.

Position numbers begin at padding_idx+1. Padding symbols are ignored. This is modified from fairseq’s utils.make_positions.

intel_extension_for_transformers.transformers.modeling.modeling_roberta_dynamic.expand_gather(input, dim, index)[source]: Expand gather.