In this notebook, we will download a model, dataset, and metric from Hugging Face Hub and generate an interactive HTML Model Card using Intel AI Safety Model Card Generator Tool.

1. Download and Import Dependencies

[ ]:
!pip install evaluate datasets transformers[torch] scikit-learn
[ ]:
from intel_ai_safety.model_card_gen.model_card_gen import ModelCardGen
from datasets import load_dataset
import evaluate
from transformers import AutoConfig,AutoModelForSequenceClassification,AutoTokenizer
import pandas as pd

from collections import Counter
from functools import reduce
import json
import numpy as np

2. Download Dataset from Hugging Face Datasets

[ ]:
raw_dataset = load_dataset("hatexplain")
he_dataset = raw_dataset.map(lambda e: {'text': " ".join(e['post_tokens'])})

3. Transform Dataset

[3]:
def get_common_targets(elm, ignore=['Other', 'None']):
    """
    This function merges annotated targets from each annotator
    into a single list when annotators agree
    """
    targets = elm['annotators']['target']
    counts = reduce(lambda x, y: Counter(x) + Counter(y) , targets)
    result = [target for target, count in counts.items() if count > 1]
    if result:
        return {'target': result}
    else:
        return {'target': []}

he_dataset = he_dataset.map(get_common_targets)
[4]:
def get_top_communites(targets, top=10):
    target_counts = reduce(lambda x, y: Counter(x) + Counter(y) , targets)
    top_targets, _ =  zip(*target_counts.most_common(top))
    return set(top_targets)

TOP = get_top_communites(he_dataset['test']['target'])

def filter_top_target(elm):
    """
    This function filters the identity groups targeted
    in each item with the top 10 most common identity groups
    """
    targets = set(elm['target']) & TOP
    return {'target': targets}

he_dataset = he_dataset.map(filter_top_target)
[5]:
def get_label(elm):
    """
    This fuction gets a ground truth label from annotators labels
    """

    labels = elm['annotators']['label']
    max_label = max(labels, key=labels.count)
    return {'label': max_label}

he_dataset = he_dataset.map(get_label)

4. Download Model and Process Outputs

[ ]:
from torch.nn.functional import softmax

he_dataset.set_format("pt", columns=["post_tokens"], output_all_columns=True)
tokenizer = AutoTokenizer.from_pretrained("Hate-speech-CNERG/bert-base-uncased-hatexplain")
model = AutoModelForSequenceClassification.from_pretrained("Hate-speech-CNERG/bert-base-uncased-hatexplain")
def process(examples):
    bert_tokens =  tokenizer(examples['text'], return_tensors="pt")
    output = model(**bert_tokens)
    return {"output": softmax(output['logits'], dim=-1).flatten()}

test_ds = he_dataset['test'].map(process)

5. Get Bias Metric form Hugging Face

[ ]:
metric = evaluate.load('Intel/bias_auc')
print(metric)

6. Run Bias Metric and Transform output to get Metric by Group for Model Card

Model Card Generator takes two pandas dataframes as input. We will first create a metrics_by_group dataframe from the Bias AUC metric above.

[10]:
unique_subgroups  = set(group for group_list in test_ds['target'] for group in group_list) - set(['Disability'])
target_groups = test_ds['target']
y_pred_prob = test_ds['output']
true_labels = test_ds['label']
class_label_map = {0 : "Hate",
                   1 : "Offensive",
                   2 : "Normal"}
num_of_classes = len(y_pred_prob[0])
metrics_by_group = pd.DataFrame()

for class_label in range(num_of_classes):
    class_metric = metric
    binary_class_labels = [1 if label == class_label else 0 for label in true_labels]
    class_probs = [[prob[class_label], 1 - prob[class_label]] for prob in y_pred_prob]
    class_metric.add_batch(target = target_groups,
                     label = binary_class_labels,
                     output = class_probs)
    metric_output = class_metric.compute(subgroups = unique_subgroups)
    metrics_by_group_per_class = (pd.DataFrame.from_dict(metric_output).
      T.
      reset_index().
      rename({'index': 'group'}, axis=1))
    metrics_by_group_per_class['feature'] = ['target'] * len(metrics_by_group_per_class)
    metrics_by_group_per_class['label'] = [class_label_map[class_label]] * len(metrics_by_group_per_class)
    metrics_by_group = pd.concat([metrics_by_group, metrics_by_group_per_class], ignore_index=True)

[11]:
metrics_by_group
[11]:
group Subgroup BPSN BNSP feature label
0 None 0.663978 0.051755 0.797641 target Hate
1 Islam 0.252081 0.225308 0.068953 target Hate
2 Homosexual 0.166520 0.104956 0.114978 target Hate
3 Other 0.064327 0.036619 0.117924 target Hate
4 African 0.179228 0.276113 0.049488 target Hate
5 Caucasian 0.119734 0.035024 0.201707 target Hate
6 Women 0.091592 0.052775 0.134179 target Hate
7 Jewish 0.158551 0.284709 0.042195 target Hate
8 Arab 0.193939 0.204050 0.072467 target Hate
9 Refugee 0.217247 0.050079 0.265590 target Hate
10 Overall generalized mean 0.097524 0.047229 0.060728 target Hate
11 None 0.195018 0.278376 0.107284 target Offensive
12 Islam 0.134600 0.093231 0.177565 target Offensive
13 Homosexual 0.154586 0.129348 0.145039 target Offensive
14 Other 0.169524 0.136259 0.171880 target Offensive
15 African 0.201166 0.063841 0.406097 target Offensive
16 Caucasian 0.223684 0.337492 0.056882 target Offensive
17 Women 0.223734 0.199604 0.116902 target Offensive
18 Jewish 0.141978 0.049844 0.322896 target Offensive
19 Arab 0.273810 0.079282 0.335899 target Offensive
20 Refugee 0.175708 0.234858 0.083398 target Offensive
21 Overall generalized mean 0.167782 0.073353 0.086443 target Offensive
22 None 0.143346 0.273709 0.112244 target Normal
23 Islam 0.383023 0.170385 0.438840 target Normal
24 Homosexual 0.314105 0.345418 0.183157 target Normal
25 Other 0.130495 0.322379 0.087658 target Normal
26 African 0.233568 0.118453 0.403769 target Normal
27 Caucasian 0.251852 0.210714 0.265974 target Normal
28 Women 0.284795 0.335885 0.176257 target Normal
29 Jewish 0.236979 0.144365 0.362873 target Normal
30 Arab 0.220513 0.200502 0.256000 target Normal
31 Refugee 0.355429 0.354636 0.207917 target Normal
32 Overall generalized mean 0.182211 0.168860 0.130460 target Normal

7. Transform Output for Metrics by Threshold for Model Card

Now, we will create metrics_by_threshold containing performance metrics at threshold.

[14]:
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score
import numpy as np
import pandas as pd
thetas = np.linspace(0, 1, 1001)
y_pred_prob = test_ds['output']
true_labels = test_ds['label']
num_of_classes = len(y_pred_prob[0])
metrics_by_threshold = pd.DataFrame()
class_label_index_map = {0 : "Hate",
                         1 : "Offensive",
                         2 : "Normal"}

for class_label in range(num_of_classes):
    class_y_pred_prob = y_pred_prob[:,class_label]
    binary_class_labels  = [1 if label == class_label else 0 for label in true_labels]
    metrics_dict_per_class ={
                            'threshold': thetas,
                            'precision': [precision_score(binary_class_labels , class_y_pred_prob > theta,zero_division=0) for theta in thetas],
                            'recall': [recall_score(binary_class_labels , class_y_pred_prob > theta) for theta in thetas],
                            'f1': [f1_score(binary_class_labels , class_y_pred_prob > theta) for theta in thetas],
                            'accuracy' : [accuracy_score(binary_class_labels, class_y_pred_prob > theta) for theta in thetas],
                            'label': [class_label_index_map[class_label]]*len(thetas)
    }
    metrics_by_threshold = pd.concat([metrics_by_threshold, pd.DataFrame.from_dict(metrics_dict_per_class)], ignore_index=True)

[15]:
metrics_by_threshold
[15]:
threshold precision recall f1 accuracy label
0 0.000 0.308732 1.0 0.471803 0.308732 Hate
1 0.001 0.308732 1.0 0.471803 0.308732 Hate
2 0.002 0.308732 1.0 0.471803 0.308732 Hate
3 0.003 0.308732 1.0 0.471803 0.308732 Hate
4 0.004 0.308732 1.0 0.471803 0.308732 Hate
... ... ... ... ... ... ...
2998 0.996 0.000000 0.0 0.000000 0.715177 Normal
2999 0.997 0.000000 0.0 0.000000 0.715177 Normal
3000 0.998 0.000000 0.0 0.000000 0.715177 Normal
3001 0.999 0.000000 0.0 0.000000 0.715177 Normal
3002 1.000 0.000000 0.0 0.000000 0.715177 Normal

3003 rows × 6 columns

[20]:
metrics_by_threshold.to_csv('multiclass_metrics_by_threshold.csv', index=False)
metrics_by_group.to_csv('multiclass_metrics_by_group.csv', index=False)

8. Build Model Card

Simply add the dataframes into the ModelCardGen.generate class method to build a model card.

[16]:
mc =  {
    "schema_version": "0.0.1",
    "model_details": {
        "name": "Explainable Hate Speech Detection",
        "version": {
            "name": "0.1",
            "date": "2020"
        },
        "graphics": {},

        "citations": [
             {
                "citation": '''@article{mathew2020hatexplain,
                          title={HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection},
                          author={Mathew, Binny and Saha, Punyajoy and Yimam, Seid Muhie and Biemann, Chris and Goyal, Pawan and Mukherjee, Animesh},
                          journal={arXiv preprint arXiv:2012.10289},
                          year={2020}}'''
             },
        ],
        "overview": 'The model is used for classifying a text as Hatespeech, Offensive, or Normal. The model is trained using data from Gab and Twitter and Human Rationales were included as part of the training data to boost the performance. The dataset and models are available here: https://github.com/punyajoy/HateXplain',
    }
}
[17]:
mcg = ModelCardGen.generate(metrics_by_group=metrics_by_group, metrics_by_threshold=metrics_by_threshold, model_card=mc)
mcg
[17]:
Model Card for Explainable Hate Speech Detection

Model Details

Overview

The model is used for classifying a text as Hatespeech, Offensive, or Normal. The model is trained using data from Gab and Twitter and Human Rationales were included as part of the training data to boost the performance. The dataset and models are available here: https://github.com/punyajoy/HateXplain

Model Performance

Overall Accuracy/Precision/Recall/F1 - Label : (Hate)

Overall Accuracy/Precision/Recall/F1 - Label : (Normal)

Overall Accuracy/Precision/Recall/F1 - Label : (Offensive)

Version

name: 0.1
date: 2020

Citations

  • @article{mathew2020hatexplain, title={HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection}, author={Mathew, Binny and Saha, Punyajoy and Yimam, Seid Muhie and Biemann, Chris and Goyal, Pawan and Mukherjee, Animesh}, journal={arXiv preprint arXiv:2012.10289}, year={2020}}

Quantitative Analysis

Metrics at Threshold - Label : (Hate)

Metrics at Threshold - Label : (Normal)

Metrics at Threshold - Label : (Offensive)

Metrics by Group - Label : (Hate)

Metrics by Group - Label : (Normal)

Metrics by Group - Label : (Offensive)

[18]:
mcg.export_html('ModelCard.html')