In this notebook, we will download a model, dataset, and metric from Hugigng Face Hub and generate a interactive HTML Model Card using Intel AI Safety Model Card Generator Tool.
1. Download and Import Dependencies
[ ]:
!pip install evaluate datasets transformers[torch] scikit-learn
[ ]:
from intel_ai_safety.model_card_gen.model_card_gen import ModelCardGen
from datasets import load_dataset
import evaluate
from transformers import AutoConfig,AutoModelForSequenceClassification,AutoTokenizer
import pandas as pd
from collections import Counter
from functools import reduce
import json
import numpy as np
2. Download Dataset from Hugging Face Datasets
[3]:
raw_dataset = load_dataset("hatexplain")
he_dataset = raw_dataset.map(lambda e: {'text': " ".join(e['post_tokens'])})
3. Transform Dataset
[4]:
def get_common_targets(elm, ignore=['Other', 'None']):
"""
This function merges annotated targets from each annotator
into a single list when annotators agree
"""
targets = elm['annotators']['target']
counts = reduce(lambda x, y: Counter(x) + Counter(y) , targets)
result = [target for target, count in counts.items() if count > 1]
if result:
return {'target': result}
else:
return {'target': []}
he_dataset = he_dataset.map(get_common_targets)
[5]:
def get_top_communites(targets, top=10):
target_counts = reduce(lambda x, y: Counter(x) + Counter(y) , targets)
top_targets, _ = zip(*target_counts.most_common(top))
return set(top_targets)
TOP = get_top_communites(he_dataset['test']['target'])
def filter_top_target(elm):
"""
This function filteras the identity groups targeted
in each item with the top 10 most common identity groups
"""
targets = set(elm['target']) & TOP
return {'target': targets}
he_dataset = he_dataset.map(filter_top_target)
[6]:
def get_label(elm):
"""
This fuction gets a ground truth label from annotators labels
"""
label_map = {0: 1, # hatespech -> 1
1: 0, # normal -> 0
2: 1} # abusive -> 1
labels = elm['annotators']['label']
max_label = max(labels, key=labels.count)
return {'label': label_map[max_label]}
he_dataset = he_dataset.map(get_label)
4. Download Modle and Process Outputs
[7]:
from torch.nn.functional import softmax
he_dataset.set_format("pt", columns=["post_tokens"], output_all_columns=True)
model = AutoModelForSequenceClassification.from_pretrained("Hate-speech-CNERG/dehatebert-mono-english")
tokenizer = AutoTokenizer.from_pretrained("Hate-speech-CNERG/dehatebert-mono-english")
def process(examples):
bert_tokens = tokenizer(examples['text'], return_tensors="pt")
output = model(**bert_tokens)
return {"output": softmax(output['logits'], dim=-1).flatten()}
test_ds = he_dataset['test'].map(process)
5. Get Bias Metric form Hugging Face
[8]:
metric = evaluate.load('Intel/bias_auc')
print(metric)
EvaluationModule(name: "bias_auc", module_type: "metric", features: {'target': Sequence(feature=Value(dtype='string', id=None), length=-1, id=None), 'label': Value(dtype='int64', id=None), 'output': Sequence(feature=Value(dtype='float32', id=None), length=-1, id=None)}, usage: """Args:
target list[list[str]]: list containing list of group targeted for each item
label list[int]: list containing label index for each item
output list[list[float]]: list of model output values for each
subgroup list[str] (optional): list of subgroups that appear in target to compute metric over
Returns (for each subgroup in target):
'Subgroup' : Subgroup AUC score,
'BPSN' : BPSN (Background Positive, Subgroup Negative) AUC,
'BNSP' : BNSP (Background Negative, Subgroup Positive) AUC score,
Example:
>>> from evaluate import load
>>> target = [['Islam'],
... ['Sexuality'],
... ['Sexuality'],
... ['Islam']]
>>> label = [0, 0, 1, 1]
>>> output = [[0.44452348351478577, 0.5554765462875366],
... [0.4341845214366913, 0.5658154487609863],
... [0.400595098733902, 0.5994048714637756],
... [0.3840397894382477, 0.6159601807594299]]
>>> metric = load('Intel/bias_auc')
>>> metric.add_batch(target=target,
label=label,
output=output)
>>> metric.compute(subgroups = None)
""", stored examples: 0)
6. Run Bias Metric
[9]:
metric.add_batch(target=test_ds['target'],
label=test_ds['label'],
output=test_ds['output'])
subgroups = set(group for group_list in test_ds['target'] for group in group_list) - set(['Disability'])
metric_output = metric.compute(subgroups = subgroups)
[10]:
metric_output
[10]:
{'Refugee': {'Subgroup': 0.7311320754716981,
'BPSN': 0.678562242798354,
'BNSP': 0.7630785937287905},
'Jewish': {'Subgroup': 0.5336263736263737,
'BPSN': 0.6644343891402714,
'BNSP': 0.6367426970788316},
'Homosexual': {'Subgroup': 0.7495069033530571,
'BPSN': 0.5786049393644331,
'BNSP': 0.8356651669775779},
'Islam': {'Subgroup': 0.6413432424993118,
'BPSN': 0.7368656382740889,
'BNSP': 0.6499369565573367},
'Women': {'Subgroup': 0.7131519274376417,
'BPSN': 0.7320302982149478,
'BNSP': 0.6984689059759881},
'None': {'Subgroup': 0.6752210816335599,
'BPSN': 0.7373480790382473,
'BNSP': 0.6139316239316239},
'Arab': {'Subgroup': 0.6023809523809524,
'BPSN': 0.541080680977054,
'BNSP': 0.7807407407407407},
'Caucasian': {'Subgroup': 0.7462406015037595,
'BPSN': 0.8013460703430308,
'BNSP': 0.6682660367609065},
'African': {'Subgroup': 0.7112082928409459,
'BPSN': 0.45742948342127554,
'BNSP': 0.8486346379911739},
'Other': {'Subgroup': 0.7295238095238096,
'BPSN': 0.7852631578947368,
'BNSP': 0.6521296296296296},
'Overall generalized mean': {'Subgroup': 0.657871089806175,
'BPSN': 0.606444668229416,
'BNSP': 0.690561798916551}}
7. Transform Output for Model Card
Mode Card Generator take two pandas dataframes as input. We will creat a metrics_by_group dataframe from the Bias AUC metric above as well as a metrics_by_threshold containing performance metrics at threshold.
[11]:
metrics_by_group = (pd.DataFrame.from_dict(metric_output).
T.
reset_index().
rename({'index': 'group'}, axis=1))
metrics_by_group['feature'] = ['target'] * len(metrics_by_group)
metrics_by_group
[11]:
| group | Subgroup | BPSN | BNSP | feature | |
|---|---|---|---|---|---|
| 0 | Refugee | 0.731132 | 0.678562 | 0.763079 | target |
| 1 | Jewish | 0.533626 | 0.664434 | 0.636743 | target |
| 2 | Homosexual | 0.749507 | 0.578605 | 0.835665 | target |
| 3 | Islam | 0.641343 | 0.736866 | 0.649937 | target |
| 4 | Women | 0.713152 | 0.732030 | 0.698469 | target |
| 5 | None | 0.675221 | 0.737348 | 0.613932 | target |
| 6 | Arab | 0.602381 | 0.541081 | 0.780741 | target |
| 7 | Caucasian | 0.746241 | 0.801346 | 0.668266 | target |
| 8 | African | 0.711208 | 0.457429 | 0.848635 | target |
| 9 | Other | 0.729524 | 0.785263 | 0.652130 | target |
| 10 | Overall generalized mean | 0.657871 | 0.606445 | 0.690562 | target |
[ ]:
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score
import numpy as np
thetas = np.linspace(0, 1, 1001)
y_pred_prob = test_ds['output'][:,1]
metrics_dict ={
'threshold': thetas,
'precision': [precision_score(test_ds['label'], y_pred_prob > theta) for theta in thetas],
'recall': [recall_score(test_ds['label'], y_pred_prob > theta) for theta in thetas],
'f1': [f1_score(test_ds['label'], y_pred_prob > theta) for theta in thetas],
'accuracy' : [accuracy_score(test_ds['label'], y_pred_prob > theta) for theta in thetas]
}
[13]:
metrics_by_threshold = pd.DataFrame.from_dict(metrics_dict)
[14]:
metrics_by_threshold
[14]:
| threshold | precision | recall | f1 | accuracy | |
|---|---|---|---|---|---|
| 0 | 0.000 | 0.593555 | 1.0 | 0.744945 | 0.593555 |
| 1 | 0.001 | 0.593555 | 1.0 | 0.744945 | 0.593555 |
| 2 | 0.002 | 0.593555 | 1.0 | 0.744945 | 0.593555 |
| 3 | 0.003 | 0.593555 | 1.0 | 0.744945 | 0.593555 |
| 4 | 0.004 | 0.593555 | 1.0 | 0.744945 | 0.593555 |
| ... | ... | ... | ... | ... | ... |
| 996 | 0.996 | 0.000000 | 0.0 | 0.000000 | 0.406445 |
| 997 | 0.997 | 0.000000 | 0.0 | 0.000000 | 0.406445 |
| 998 | 0.998 | 0.000000 | 0.0 | 0.000000 | 0.406445 |
| 999 | 0.999 | 0.000000 | 0.0 | 0.000000 | 0.406445 |
| 1000 | 1.000 | 0.000000 | 0.0 | 0.000000 | 0.406445 |
1001 rows × 5 columns
8. Build Model Card
Simply add the dataframes into the ModelCardGen.generate class method to build a model card.
[15]:
mc = {
"schema_version": "0.0.1",
"model_details": {
"name": "Deep Learning Models for Multilingual Hate Speech Detection",
"version": {
"name": "25d0e4d9122d2a5c283e07405a325e3dfd4a73b3",
"date": "2020"
},
"graphics": {},
"citations": [
{
"citation": '''@article{aluru2020deep,
title={Deep Learning Models for Multilingual Hate Speech Detection},
author={Aluru, Sai Saket and Mathew, Binny and Saha, Punyajoy and Mukherjee, Animesh},
journal={arXiv preprint arXiv:2004.06465},
year={2020}
}'''
},
],
"overview": 'This model is used detecting hatespeech in English language. The mono in the name refers to the monolingual setting, where the model is trained using only English language data. It is finetuned on multilingual bert model. The model is trained with different learning rates and the best validation score achieved is 0.726030 for a learning rate of 2e-5. Training code can be found here https://github.com/punyajoy/DE-LIMIT',
}
}
[16]:
mcg = ModelCardGen.generate(metrics_by_group=metrics_by_group, metrics_by_threshold=metrics_by_threshold, model_card=mc)
mcg
[16]:
Model Details
Overview
This model is used detecting hatespeech in English language. The mono in the name refers to the monolingual setting, where the model is trained using only English language data. It is finetuned on multilingual bert model. The model is trained with different learning rates and the best validation score achieved is 0.726030 for a learning rate of 2e-5. Training code can be found here https://github.com/punyajoy/DE-LIMITModel Performance
Overall Accuracy/Precision/Recall/F1
Version
name: 25d0e4d9122d2a5c283e07405a325e3dfd4a73b3
date: 2020
Citations
- @article{aluru2020deep, title={Deep Learning Models for Multilingual Hate Speech Detection}, author={Aluru, Sai Saket and Mathew, Binny and Saha, Punyajoy and Mukherjee, Animesh}, journal={arXiv preprint arXiv:2004.06465}, year={2020} }