neural_compressor.metric.evaluate_squad

Official evaluation script for v1.1 of the SQuAD dataset.

Functions

`f1_score`(prediction, ground_truth)	Calculate the F1 score of the prediction and the ground_truth.
`metric_max_over_ground_truths`(metric_fn, prediction, ...)	Calculate the max metric for each ground truth.
`exact_match_score`(prediction, ground_truth)	Compute the exact match score between prediction and ground truth.
`evaluate`(dataset, predictions)	Evaluate the average F1 score and the exact match score for Question-Answering results.

neural_compressor.metric.evaluate_squad.f1_score(prediction, ground_truth)[source]

Calculate the F1 score of the prediction and the ground_truth.

Parameters:

Returns:

The F1 score of prediction. Float point number.

neural_compressor.metric.evaluate_squad.metric_max_over_ground_truths(metric_fn, prediction, ground_truths)[source]

Calculate the max metric for each ground truth.

For each answer in ground_truths, evaluate the metric of prediction with this answer, and return the max metric.

Parameters:

Returns:

The max metric. Float point number.

neural_compressor.metric.evaluate_squad.exact_match_score(prediction, ground_truth)[source]

Compute the exact match score between prediction and ground truth.

Parameters:

Returns:

The exact match score.

neural_compressor.metric.evaluate_squad.evaluate(dataset, predictions)[source]

Evaluate the average F1 score and the exact match score for Question-Answering results.

Parameters:

dataset – The dataset to evaluate the prediction. A list instance of articles. An article contains a list of paragraphs, a paragraph contains a list of question-and-answers (qas), and a question-and-answer contains an id, a question, and a list of correct answers. For example:
predictions – The result of predictions to be evaluated. A dict mapping the id of a question to the predicted answer of the question.

Returns:

The F1 score and the exact match score.