utils.eval_utils

Response Parsing and Evaluation for various models.

Functions

`parse_multi_choice_response`(response, all_choices, ...)	Parse the prediction from the generated response.
`check_is_number`(string)	Check if the given string a number.
`normalize_str`(string)	Normalize the str to lower case and make them float numbers if possible.
`extract_numbers`(string)	Exact all forms of numbers from a string with regex.
`parse_open_response`(response)	Parse the prediction from the generated response.
`eval_multi_choice`(gold_i, pred_i)	Evaluate a multiple choice instance.
`eval_open`(gold_i, pred_i)	Evaluate an open question instance.
`evaluate`(samples)	Batch evaluation for multiple choice and open questions.
`calculate_ins_level_acc`(results)	Calculate the instruction level accuracy for given Subject results.

utils.eval_utils.parse_multi_choice_response(response, all_choices, index2ans)[source]

Parse the prediction from the generated response.

Return the predicted index e.g., A, B, C, D.

utils.eval_utils.check_is_number(string)[source]: Check if the given string a number.

utils.eval_utils.normalize_str(string)[source]: Normalize the str to lower case and make them float numbers if possible.

utils.eval_utils.extract_numbers(string)[source]: Exact all forms of numbers from a string with regex.

utils.eval_utils.parse_open_response(response)[source]

Parse the prediction from the generated response.

Return a list of predicted strings or numbers.

utils.eval_utils.eval_multi_choice(gold_i, pred_i)[source]: Evaluate a multiple choice instance.

utils.eval_utils.eval_open(gold_i, pred_i)[source]: Evaluate an open question instance.

utils.eval_utils.evaluate(samples)[source]: Batch evaluation for multiple choice and open questions.

utils.eval_utils.calculate_ins_level_acc(results: Dict)[source]: Calculate the instruction level accuracy for given Subject results.