utils.eval_utils

Response Parsing and Evaluation for various models.

Functions

parse_multi_choice_response(response, all_choices, ...)

Parse the prediction from the generated response.

check_is_number(string)

Check if the given string a number.

normalize_str(string)

Normalize the str to lower case and make them float numbers if possible.

extract_numbers(string)

Exact all forms of numbers from a string with regex.

parse_open_response(response)

Parse the prediction from the generated response.

eval_multi_choice(gold_i, pred_i)

Evaluate a multiple choice instance.

eval_open(gold_i, pred_i)

Evaluate an open question instance.

evaluate(samples)

Batch evaluation for multiple choice and open questions.

calculate_ins_level_acc(results)

Calculate the instruction level accuracy for given Subject results.

Module Contents

utils.eval_utils.parse_multi_choice_response(response, all_choices, index2ans)[source]

Parse the prediction from the generated response.

Return the predicted index e.g., A, B, C, D.

utils.eval_utils.check_is_number(string)[source]

Check if the given string a number.

utils.eval_utils.normalize_str(string)[source]

Normalize the str to lower case and make them float numbers if possible.

utils.eval_utils.extract_numbers(string)[source]

Exact all forms of numbers from a string with regex.

utils.eval_utils.parse_open_response(response)[source]

Parse the prediction from the generated response.

Return a list of predicted strings or numbers.

utils.eval_utils.eval_multi_choice(gold_i, pred_i)[source]

Evaluate a multiple choice instance.

utils.eval_utils.eval_open(gold_i, pred_i)[source]

Evaluate an open question instance.

utils.eval_utils.evaluate(samples)[source]

Batch evaluation for multiple choice and open questions.

utils.eval_utils.calculate_ins_level_acc(results: Dict)[source]

Calculate the instruction level accuracy for given Subject results.