:py:mod:`neural_compressor.experimental.metric.f1`
==================================================

.. py:module:: neural_compressor.experimental.metric.f1

.. autoapi-nested-parse::

   Official evaluation script for v1.1 of the SQuAD dataset.

   From https://github.com/allenai/bi-att-flow/blob/master/squad/evaluate-v1.1.py



Module Contents
---------------


Functions
~~~~~~~~~

.. autoapisummary::

   neural_compressor.experimental.metric.f1.normalize_answer
   neural_compressor.experimental.metric.f1.f1_score
   neural_compressor.experimental.metric.f1.metric_max_over_ground_truths
   neural_compressor.experimental.metric.f1.evaluate



.. py:function:: normalize_answer(text: str) -> str

   Normalize the answer text.

   Lower text, remove punctuation, articles and extra whitespace,
   and replace other whitespace (newline, tab, etc.) to space.

   :param s: The text to be normalized.

   :returns: The normalized text.


.. py:function:: f1_score(prediction: collections.abc.Sequence, ground_truth: collections.abc.Sequence)

   Calculate the F1 score of the prediction and the ground_truth.

   :param prediction: the predicted answer.
   :param ground_truth: the correct answer.

   :returns: The F1 score of prediction. Float point number.


.. py:function:: metric_max_over_ground_truths(metric_fn: Callable[[T, T], float], prediction: str, ground_truths: List[str]) -> float

   Calculate the max metric for each ground truth.

   For each answer in ground_truths, evaluate the metric of prediction with
   this answer, and return the max metric.

   :param metric_fn: the function to calculate the metric.
   :param prediction: the prediction result.
   :param ground_truths: the list of correct answers.

   :returns: The max metric. Float point number.


.. py:function:: evaluate(predictions: Dict[str, str], dataset: List[Dict[str, Any]]) -> float

   Evaluate the average F1 score of Question-Answering results.

   The F1 score is the harmonic mean of the precision and recall. It can be computed
   with the equation: F1 = 2 * (precision * recall) / (precision + recall).
   For all question-and-answers in dataset, it evaluates the f1-score

   :param predictions: The result of predictions to be evaluated. A dict mapping the id of
                       a question to the predicted answer of the question.
   :param dataset: The dataset to evaluate the prediction. A list instance of articles.
                   An article contains a list of paragraphs, a paragraph contains a list of
                   question-and-answers (qas), and a question-and-answer cantains an id, a question,
                   and a list of correct answers. For example:

                   [{'paragraphs':
                         [{'qas':[{'answers': [{'answer_start': 177, 'text': 'Denver Broncos'}, ...],
                                   'question': 'Which NFL team represented the AFC at Super Bowl 50?',
                                   'id': '56be4db0acb8001400a502ec'}]}]}]

   :returns: The F1 score of this prediction. Float point number in forms of a percentage.