neural_compressor.metric.evaluate_squad
Official evaluation script for v1.1 of the SQuAD dataset.
From https://github.com/allenai/bi-att-flow/blob/master/squad/evaluate-v1.1.py
Functions
|
Calculate the F1 score of the prediction and the ground_truth. |
|
Calculate the max metric for each ground truth. |
|
Compute the exact match score between prediction and ground truth. |
|
Evaluate the average F1 score and the exact match score for Question-Answering results. |
Module Contents
- neural_compressor.metric.evaluate_squad.f1_score(prediction, ground_truth)[source]
Calculate the F1 score of the prediction and the ground_truth.
- Parameters:
prediction – The predicted result.
ground_truth – The ground truth.
- Returns:
The F1 score of prediction. Float point number.
- neural_compressor.metric.evaluate_squad.metric_max_over_ground_truths(metric_fn, prediction, ground_truths)[source]
Calculate the max metric for each ground truth.
For each answer in ground_truths, evaluate the metric of prediction with this answer, and return the max metric.
- Parameters:
metric_fn – The function to calculate the metric.
prediction – The prediction result.
ground_truths – A list of correct answers.
- Returns:
The max metric. Float point number.
- neural_compressor.metric.evaluate_squad.exact_match_score(prediction, ground_truth)[source]
Compute the exact match score between prediction and ground truth.
- Parameters:
prediction – The result of predictions to be evaluated.
ground_truth – The ground truth.
- Returns:
The exact match score.
- neural_compressor.metric.evaluate_squad.evaluate(dataset, predictions)[source]
Evaluate the average F1 score and the exact match score for Question-Answering results.
- Parameters:
dataset – The dataset to evaluate the prediction. A list instance of articles. An article contains a list of paragraphs, a paragraph contains a list of question-and-answers (qas), and a question-and-answer contains an id, a question, and a list of correct answers. For example:
predictions – The result of predictions to be evaluated. A dict mapping the id of a question to the predicted answer of the question.
- Returns:
The F1 score and the exact match score.