Creating Model Card for Toxic Comments Classification in Tensorflow

Training Dependencies

[ ]:

import os
import tempfile
import numpy as np
import pandas as pd
from datetime import datetime

import tensorflow_hub as hub
import tensorflow as tf
import tensorflow_model_analysis as tfma
import tensorflow_data_validation as tfdv

from tensorflow_model_analysis.addons.fairness.post_export_metrics import fairness_indicators
from tensorflow_model_analysis.addons.fairness.view import widget_view

Model Card Dependencies

[ ]:

from intel_ai_safety.model_card_gen.model_card_gen import ModelCardGen
from intel_ai_safety.model_card_gen.datasets import TensorflowDataset

Download Data

Data Description

This version of the CivilComments Dataset provides access to the primary seven labels that were annotated by crowd workers, the toxicity and other tags are a value between 0 and 1 indicating the fraction of annotators that assigned these attributes to the comment text.

The other tags are only available for a fraction of the input examples. They are currently ignored for the main dataset; the CivilCommentsIdentities set includes those labels, but only consists of the subset of the data with them. The other attributes that were part of the original CivilComments release are included only in the raw data. See the Kaggle documentation for more details about the available features.

The comments in this dataset come from an archive of the Civil Comments platform, a commenting plugin for independent news sites. These public comments were created from 2015 - 2017 and appeared on approximately 50 English-language news sites across the world. When Civil Comments shut down in 2017, they chose to make the public comments available in a lasting open archive to enable future research. The original data, published on figshare, includes the public comment text, some associated metadata such as article IDs, timestamps and commenter-generated “civility” labels, but does not include user ids. Jigsaw extended this dataset by adding additional labels for toxicity, identity mentions, as well as covert offensiveness. This data set is an exact replica of the data released for the Jigsaw Unintended Bias in Toxicity Classification Kaggle challenge. This dataset is released under CC0, as is the underlying comment text.

For comments that have a parent_id also in the civil comments data, the text of the previous comment is provided as the “parent_text” feature. Note that the splits were made without regard to this information, so using previous comments may leak some information. The annotators did not have access to the parent text when making the labels.

source: https://www.tensorflow.org/datasets/catalog/civil_comments

@misc{pavlopoulos2020toxicity,
    title={Toxicity Detection: Does Context Really Matter?},
    author={John Pavlopoulos and Jeffrey Sorensen and Lucas Dixon and Nithum Thain and Ion Androutsopoulos},
    year={2020}, eprint={2006.00998}, archivePrefix={arXiv}, primaryClass={cs.CL}
}

@article{DBLP:journals/corr/abs-1903-04561,
  author    = {Daniel Borkan and
               Lucas Dixon and
               Jeffrey Sorensen and
               Nithum Thain and
               Lucy Vasserman},
  title     = {Nuanced Metrics for Measuring Unintended Bias with Real Data for Text
               Classification},
  journal   = {CoRR},
  volume    = {abs/1903.04561},
  year      = {2019},
  url       = {http://arxiv.org/abs/1903.04561},
  archivePrefix = {arXiv},
  eprint    = {1903.04561},
  timestamp = {Sun, 31 Mar 2019 19:01:24 +0200},
  biburl    = {https://dblp.org/rec/bib/journals/corr/abs-1903-04561},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

@inproceedings{pavlopoulos-etal-2021-semeval,
    title = "{S}em{E}val-2021 Task 5: Toxic Spans Detection",
    author = "Pavlopoulos, John  and Sorensen, Jeffrey  and Laugier, L{'e}o and Androutsopoulos, Ion",
    booktitle = "Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.semeval-1.6",
    doi = "10.18653/v1/2021.semeval-1.6",
    pages = "59--69",
}

Feature documentation:

Feature	Class	Dtype
article_id	Tensor	tf.int32
id	Tensor	tf.string
identity_attack	Tensor	tf.float32
insult	Tensor	tf.float32
obscene	Tensor	tf.float32
parent_id	Tensor	tf.int32
parent_text	Text	tf.string
severe_toxicity	Tensor	tf.float32
sexual_explicit	Tensor	tf.float32
text	Text	tf.string
threat	Tensor	tf.float32
toxicity	Tensor	tf.float32

[ ]:

dataset_url = 'https://storage.googleapis.com/civil_comments_dataset/'

train_tf_file = tf.keras.utils.get_file('train_tf_processed.tfrecord',
                                        dataset_url + 'train_tf_processed.tfrecord')

validate_tf_file = tf.keras.utils.get_file('validate_tf_processed.tfrecord',
                                           dataset_url + 'validate_tf_processed.tfrecord')

Train Model

[ ]:

TEXT_FEATURE = 'comment_text'
LABEL = 'toxicity'

FEATURE_MAP = {
    LABEL: tf.io.FixedLenFeature([], tf.float32),
    TEXT_FEATURE: tf.io.FixedLenFeature([], tf.string),

    'sexual_orientation': tf.io.VarLenFeature(tf.string),
    'gender': tf.io.VarLenFeature(tf.string),
    'religion': tf.io.VarLenFeature(tf.string),
    'race': tf.io.VarLenFeature(tf.string),
    'disability': tf.io.VarLenFeature(tf.string)
}

[ ]:

def train_input_fn():
    def parse_function(serialized):
        # parse_single_example works on tf.train.Example type
        parsed_example = tf.io.parse_single_example(serialized=serialized, features=FEATURE_MAP)
        # fighting the 92%-8% imbalance in the dataset
        # adding `weight` label, doesn't exist already (only FEATURE_MAP keys exist)
        parsed_example['weight'] = tf.add(parsed_example[LABEL], 0.1)  # 0.1 for non-toxic, 1.1 for toxic
        return (parsed_example, parsed_example[LABEL])  # (x, y)


    train_dataset = tf.data.TFRecordDataset(filenames=[train_tf_file]).map(parse_function).batch(512)
    return train_dataset

Build Model

[ ]:

# vectorizing through TFHub
embedded_text_feature_column = hub.text_embedding_column(
    key=TEXT_FEATURE,
    module_spec='https://tfhub.dev/google/nnlm-en-dim128/1')

classifier = tf.estimator.DNNClassifier(
    hidden_units=[500, 100],
    weight_column='weight',
    feature_columns=[embedded_text_feature_column],
    optimizer=tf.keras.optimizers.legacy.Adagrad(learning_rate=0.003),
    loss_reduction=tf.losses.Reduction.SUM,
    n_classes=2)

Train Model

[ ]:

classifier.train(input_fn=train_input_fn, steps=1000)

Export in EvalSavedModel Format

[ ]:

MODEL_PATH = tempfile.gettempdir()

def eval_input_receiver_fn():
    serialized_tf_example = tf.compat.v1.placeholder(dtype=tf.string, shape=[None], name='input_example_placeholder')

    receiver_tensors = {'examples': serialized_tf_example}
    features = tf.io.parse_example(serialized_tf_example, FEATURE_MAP)
    features['weight'] = tf.ones_like(features[LABEL])

    return tfma.export.EvalInputReceiver(
        features=features,
        receiver_tensors=receiver_tensors,
        labels=features[LABEL]
    )

tfma_export_dir = tfma.export.export_eval_savedmodel(
    estimator = classifier,  # trained model
    export_dir_base = MODEL_PATH,
    eval_input_receiver_fn = eval_input_receiver_fn
)

[ ]:

# export EvalSavedModel
tfma_export_dir = tfma.export.export_eval_savedmodel(
    estimator = classifier,  # trained model
    export_dir_base = MODEL_PATH,
    eval_input_receiver_fn = eval_input_receiver_fn
)

Making a Model Card

[ ]:

_model_path = tfma_export_dir
_data_paths = {'eval': TensorflowDataset(validate_tf_file),
               'train': TensorflowDataset(train_tf_file)}

[ ]:

_eval_config =  'eval_config.proto'

[ ]:

%%writefile {_eval_config}

model_specs {
# To use EvalSavedModel set `signature_name` to "eval".
signature_name: "eval"
}

## Post training metric information. These will be merged with any built-in
## metrics from training.
metrics_specs {
metrics { class_name: "BinaryAccuracy" }
metrics { class_name: "Precision" }
metrics { class_name: "Recall" }
metrics { class_name: "ConfusionMatrixPlot" }
metrics { class_name: "FairnessIndicators" }
}

## Slicing information
slicing_specs {}  # overall slice
slicing_specs {
feature_keys: ["gender"]
}

[ ]:

mc = {
  "model_details": {
    "name": "Detecting Toxic Comments",
    "overview":  (
    'The Conversation AI team, a research initiative founded by Jigsaw and Google '
    '(both part of Alphabet), builds technology to protect voices in conversation. '
    'A main area of focus is machine learning models that can identify toxicity in '
    'online conversations, where toxicity is defined as anything *rude, disrespectful '
    'or otherwise likely to make someone leave a discussion*. '
    'This multi-headed model attemps to recognize toxicity and several subtypes of toxicity: '
    'This model recognizes toxicity and minimizes this type of unintended bias '
    'with respect to mentions of identities. Reduce unintended bias ensured we can detect toxicity '
    ' accross a wide range of conversations. '),
    "owners": [
      {
        "name": "Intel XAI Team",
        "contact": "xai@intel.com"
      }
    ],

    "references": [
      {
        "reference": "https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data"
      },
      {
        "reference": "https://medium.com/jigsaw/unintended-bias-and-names-of-frequently-targeted-groups-8e0b81f80a23"
      }
    ],
    "graphics": {
      "description": " "
    }
  },
  "considerations": {
      "limitations": [
            {"description": ('Overrepresented Identities in Data:\n'
                    'Identity terms for more frequently targeted groups '
                   '(e.g. words like “black”, “muslim”, “feminist”, “woman”, “gay” etc)'
                   ' often have higher scores because comments about those groups are '
                   'over-represented in abusive and toxic comments.')
            },
           {"description": ('False Positive Rate:\n'
                    'The names of targeted groups appear far more often in abusive '
                    'comments. For example, in many forums unfortunately it’s common '
                    'to use the word “gay” as an insult, or for someone to attack a '
                    'commenter for being gay, but it is much rarer for the word gay to '
                    'appear in a positive, affirming statements (e.g. “I am a proud gay man”). '
                    'When the training data used to train machine learning models contain these '
                    'comments, ML models adopt the biases that exist in these underlying distributions, '
                    'picking up negative connotations as they go. When there’s insufficient diversity '
                    'in the data, the models can over-generalize and make these kinds of errors.')
            },
           {"description": ('Imbalenced Data:\n'
                     'We developed new ways to balance the training '
                     'data so that the model sees enough toxic and non-toxic examples '
                     'containing identity terms in such a way that it can more effectively '
                     'learn to distinguish toxic from non-toxic uses. You can learn more '
                     'about this in our paper published at the AI, Ethics, and Society Conference.')
            },
        ]
    },

  "quantitative_analysis": {
    "graphics": {
      "description": " "
    }
  },
  "schema_version": "0.0.1"
}

[ ]:

mcg = ModelCardGen.generate(data_sets=_data_paths,
                            eval_config=_eval_config,
                            model_path=_model_path,
                            model_card=mc)

[ ]:

mcg