Scorers

Accuracy Scorers are implementations of dffml.accuracy.AccuracyScorer, they abstract the usage of scoring methods.

If you want to get started creating your own accuracy scorers, check out the Scorers. .. _plugin_accuracy_dffml:

dffml

pip install dffml

clf

Official

No description

mse

Official

No description

dffml_model_scratch

pip install dffml-model-scratch

anomalyscore

Official

No description

dffml_model_scikit

pip install dffml-model-scikit

Machine Learning models implemented with scikit-learn. Models are saved under the directory in subdirectories named after the hash of their feature names.

General Usage:

Training:

$ dffml train \
    -model SCIKIT_MODEL_ENTRYPOINT \
    -model-features FEATURE_DEFINITION \
    -model-predict TO_PREDICT \
    -model-location MODEL_DIRECTORY \
    -model-SCIKIT_PARAMETER_NAME SCIKIT_PARAMETER_VALUE \
    -sources f=TRAINING_DATA_SOURCE_TYPE \
    -source-filename TRAINING_DATA_FILE_NAME \
    -log debug

Testing and Accuracy:

$ dffml accuracy \
    -model SCIKIT_MODEL_ENTRYPOINT \
    -model-features FEATURE_DEFINITION \
    -model-predict TO_PREDICT \
    -model-location MODEL_DIRECTORY \
    -features TO_PREDICT \
    -sources f=TESTING_DATA_SOURCE_TYPE \
    -source-filename TESTING_DATA_FILE_NAME \
    -scorer ACCURACY_SCORER \
    -log debug

Predicting with trained model:

$ dffml predict all \
    -model SCIKIT_MODEL_ENTRYPOINT \
    -model-features FEATURE_DEFINITION \
    -model-predict TO_PREDICT \
    -model-location MODEL_DIRECTORY \
    -sources f=PREDICT_DATA_SOURCE_TYPE \
    -source-filename PREDICT_DATA_FILE_NAME \
    -log debug

Models Available:

Type

Model

Entrypoint

Parameters

Multi-Output

Regression

LinearRegression

scikitlr

scikitlr

Yes

ElasticNet

scikiteln

scikiteln

Yes

RandomForestRegressor

scikitrfr

scikitrfr

Yes

BayesianRidge

scikitbyr

scikitbyr

Yes

Lasso

scikitlas

scikitlas

Yes

ARDRegression

scikitard

scikitard

Yes

RANSACRegressor

scikitrsc

scikitrsc

Yes

DecisionTreeRegressor

scikitdtr

scikitdtr

Yes

GaussianProcessRegressor

scikitgpr

scikitgpr

Yes

OrthogonalMatchingPursuit

scikitomp

scikitomp

Yes

Lars

scikitlars

scikitlars

Yes

Ridge

scikitridge

scikitridge

Yes

Classification

KNeighborsClassifier

scikitknn

scikitknn

Yes

AdaBoostClassifier

scikitadaboost

scikitadaboost

Yes

GaussianProcessClassifier

scikitgpc

scikitgpc

Yes

DecisionTreeClassifier

scikitdtc

scikitdtc

Yes

RandomForestClassifier

scikitrfc

scikitrfc

Yes

QuadraticDiscriminantAnalysis

scikitqda

scikitqda

Yes

MLPClassifier

scikitmlp

scikitmlp

Yes

GaussianNB

scikitgnb

scikitgnb

Yes

SVC

scikitsvc

scikitsvc

Yes

LogisticRegression

scikitlor

scikitlor

Yes

GradientBoostingClassifier

scikitgbc

scikitgbc

Yes

BernoulliNB

scikitbnb

scikitbnb

Yes

ExtraTreesClassifier

scikitetc

scikitetc

Yes

BaggingClassifier

scikitbgc

scikitbgc

Yes

LinearDiscriminantAnalysis

scikitlda

scikitlda

Yes

MultinomialNB

scikitmnb

scikitmnb

Yes

Clustering

KMeans

scikitkmeans

scikitkmeans

No

Birch

scikitbirch

scikitbirch

No

MiniBatchKMeans

scikitmbkmeans

scikitmbkmeans

No

AffinityPropagation

scikitap

scikitap

No

MeanShift

scikitms

scikitms

No

SpectralClustering

scikitsc

scikitsc

No

AgglomerativeClustering

scikitac

scikitac

No

OPTICS

scikitoptics

scikitoptics

No

Scorers Available:

Type

Scorer

Entrypoint

Parameters

Multi-Output

Regression

Explained Variance Score

exvscore

exvscore

Yes

Max Error

maxerr

maxerr

No

Mean Absolute Error

meanabserr

meanabserr

Yes

Mean Squared Error

meansqrerr

meansqrerr

Yes

Mean Squared Log Error

meansqrlogerr

meansqrlogerr

Yes

Median Absolute Error

medabserr

medabserr

Yes

R2 Score

r2score

r2score

Yes

Mean Poisson Deviance

meanpoidev

meanpoidev

No

Mean Gamma Deviance

meangammadev

meangammadev

No

Mean Absolute Percentage Error

meanabspererr

meanabspererr

Yes

Classification

Accuracy Score

acscore

acscore

Yes

Balanced Accuracy Score

bacscore

bacscore

Yes

Top K Accuracy Score

topkscore

topkscore

Yes

Average Precision Score

avgprescore

avgprescore

Yes

Brier Score Loss

brierscore

brierscore

Yes

F1 Score

f1score

f1score

Yes

Log Loss

logloss

logloss

Yes

Precision Score

prescore

prescore

Yes

Recall Score

recallscore

recallscore

Yes

Jaccard Score

jacscore

jacscore

Yes

Roc Auc Score

rocaucscore

rocaucscore

Yes

Clustering

Adjusted Mutual Info Score

adjmutinfoscore

adjmutinfoscore

No

Adjusted Rand Score

adjrandscore

adjrandscore

No

Completeness Score

complscore

complscore

No

Fowlkes Mallows Score

fowlmalscore

fowlmalscore

No

Homogeneity Score

homoscore

homoscore

No

Mutual Info Score

mutinfoscore

mutinfoscore

No

Normalized Mutual Info Score

normmutinfoscore

normmutinfoscore

No

Rand Score

randscore

randscore

No

V Measure Score

vmscore

vmscore

No

Supervised

Model’s Default Score

skmodelscore

skmodelscore

Yes

Usage Example:

Example below uses LinearRegression Model using the command line.

Let us take a simple example:

Years of Experience

Expertise

Trust Factor

Salary

0

01

0.2

10

1

03

0.4

20

2

05

0.6

30

3

07

0.8

40

4

09

1.0

50

5

11

1.2

60

First we create the files

cat > train.csv << EOF
Years,Expertise,Trust,Salary
0,1,0.1,10
1,3,0.2,20
2,5,0.3,30
3,7,0.4,40
EOF
cat > test.csv << EOF
Years,Expertise,Trust,Salary
4,9,0.5,50
5,11,0.6,60
EOF

Train the model

dffml train \
  -model scikitlr \
  -model-features Years:int:1 Expertise:int:1 Trust:float:1 \
  -model-predict Salary:float:1 \
  -model-location tempdir \
  -sources f=csv \
  -source-filename train.csv

Assess accuracy

dffml accuracy \
  -model scikitlr \
  -model-features Years:int:1 Expertise:int:1 Trust:float:1 \
  -model-predict Salary:float:1 \
  -model-location tempdir \
  -features Salary:float:1 \
  -scorer mse \
  -sources f=csv \
  -source-filename test.csv

Output:

1.0

Make a prediction

echo -e 'Years,Expertise,Trust\n6,13,0.7\n' | \
dffml predict all \
  -model scikitlr \
  -model-features Years:int:1 Expertise:int:1 Trust:float:1 \
  -model-predict Salary:float:1 \
  -model-location tempdir \
  -sources f=csv \
  -source-filename /dev/stdin

Output:

[
    {
        "extra": {},
        "features": {
            "Expertise": 13,
            "Trust": 0.7,
            "Years": 6
        },
        "key": "0",
        "last_updated": "2020-03-01T22:26:46Z",
        "prediction": {
            "Salary": {
                "confidence": 1.0,
                "value": 70.0
            }
        }
    }
]

Example usage of Linear Regression Model using python API:

from dffml import CSVSource, Features, Feature
from dffml.noasync import train, score, predict
from dffml_model_scikit import LinearRegressionModel
from dffml.accuracy import MeanSquaredErrorAccuracy

model = LinearRegressionModel(
    features=Features(
        Feature("Years", int, 1),
        Feature("Expertise", int, 1),
        Feature("Trust", float, 1),
    ),
    predict=Feature("Salary", int, 1),
    location="tempdir",
)

# Train the model
train(model, "train.csv")

# Assess accuracy (alternate way of specifying data source)
scorer = MeanSquaredErrorAccuracy()
print(
    "Accuracy:",
    score(
        model,
        scorer,
        Feature("Salary", int, 1),
        CSVSource(filename="test.csv"),
    ),
)

# Make prediction
for i, features, prediction in predict(
    model,
    {"Years": 6, "Expertise": 13, "Trust": 0.7},
    {"Years": 7, "Expertise": 15, "Trust": 0.8},
):
    features["Salary"] = prediction["Salary"]["value"]
    print(features)

Example below uses KMeans Clustering Model on a small randomly generated dataset.

 $ cat > train.csv << EOF
Col1,          Col2,        Col3,         Col4
5.05776417,   8.55128116,   6.15193196,  -8.67349666
3.48864265,  -7.25952218,  -4.89216256,   4.69308946
-8.16207603,  5.16792984,  -2.66971993,   0.2401882
6.09809669,   8.36434181,   6.70940915,  -7.91491768
-9.39122566,  5.39133807,  -2.29760281,  -1.69672981
0.48311336,   8.19998973,   7.78641979,   7.8843821
2.22409135,  -7.73598586,  -4.02660224,   2.82101794
2.8137247 ,   8.36064298,   7.66196849,   3.12704676
EOF
 $ cat > test.csv << EOF
Col1,             Col2,          Col3,         Col4,    cluster
-10.16770144,   2.73057215,  -1.49351481,   2.43005691,    6
3.59705381,  -4.76520663,  -3.34916068,   5.72391486,     1
4.01612313,  -4.641852  ,  -4.77333308,   5.87551683,     0
EOF
 $ dffml train \
     -model scikitkmeans \
     -model-features Col1:float:1 Col2:float:1 Col3:float:1 Col4:float:1 \
     -model-location tempdir \
     -sources f=csv \
     -source-filename train.csv \
     -source-readonly \
     -log debug
 $ dffml accuracy \
     -model scikitkmeans \
     -model-features Col1:float:1 Col2:float:1 Col3:float:1 Col4:float:1\
     -model-predict cluster:int:1 \
     -model-location tempdir \
     -features cluster:int:1 \
     -sources f=csv \
     -source-filename test.csv \
     -source-readonly \
     -scorer skmodelscore \
     -log debug
 0.6365141682948129
 $ echo -e 'Col1,Col2,Col3,Col4\n6.09809669,8.36434181,6.70940915,-7.91491768\n' | \
   dffml predict all \
     -model scikitkmeans \
     -model-features Col1:float:1 Col2:float:1 Col3:float:1 Col4:float:1 \
     -model-location tempdir \
     -sources f=csv \
     -source-filename /dev/stdin \
     -source-readonly \
     -log debug
 [
     {
         "extra": {},
         "features": {
             "Col1": 6.09809669,
             "Col2": 8.36434181,
             "Col3": 6.70940915,
             "Col4": -7.91491768
         },
         "last_updated": "2020-01-12T22:51:15Z",
         "prediction": {
             "confidence": 0.6365141682948129,
             "value": 2
         },
         "key": "0"
     }
 ]

Example usage of KMeans Clustering Model using python API:

from dffml import CSVSource, Features, Feature
from dffml.noasync import train, score, predict
from dffml_model_scikit import KMeansModel
from dffml_model_scikit import MutualInfoScoreScorer

model = KMeansModel(
    features=Features(
        Feature("Col1", float, 1),
        Feature("Col2", float, 1),
        Feature("Col3", float, 1),
        Feature("Col4", float, 1),
    ),
    predict=Feature("cluster", int, 1),
    location="tempdir",
)

# Train the model
train(model, "train.csv")

# Assess accuracy (alternate way of specifying data source)
scorer = MutualInfoScoreScorer()
print("Accuracy:", score(model, scorer, Feature("cluster", int, 1), CSVSource(filename="test.csv")))

# Make prediction
for i, features, prediction in predict(
    model,
    {"Col1": 6.09809669, "Col2": 8.36434181, "Col3": 6.70940915, "Col4": -7.91491768},
):
    features["cluster"] = prediction["cluster"]["value"]
    print(features)

NOTE: Transductive Clusterers(scikitsc, scikitac, scikitoptics) cannot handle unseen data. Ensure that predict and accuracy for these algorithms uses training data.

Args

  • predict: Feature

    • Label or the value to be predicted

    • Only used by classification and regression models

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

dffml_model_pytorch

pip install dffml-model-pytorch

Machine Learning models implemented with PyTorch. Models are saved under the directory in model.pt.

General Usage:

Training:

$ dffml train \
    -model PYTORCH_MODEL_ENTRYPOINT \
    -model-features FEATURE_DEFINITION \
    -model-predict TO_PREDICT \
    -model-location MODEL_LOCATION \
    -model-CONFIGS CONFIG_VALUES \
    -sources f=TRAINING_DATA_SOURCE_TYPE \
    -source-CONFIGS TRAINING_DATA \
    -log debug

Testing and Accuracy:

$ dffml accuracy \
    -model PYTORCH_MODEL_ENTRYPOINT \
    -model-features FEATURE_DEFINITION \
    -model-predict TO_PREDICT \
    -model-location MODEL_LOCATION \
    -model-CONFIGS CONFIG_VALUES \
    -features TO_PREDICT \
    -sources f=TESTING_DATA_SOURCE_TYPE \
    -source-CONFIGS TESTING_DATA \
    -log debug

Predicting with trained model:

$ dffml predict all \
    -model PYTORCH_MODEL_ENTRYPOINT \
    -model-features FEATURE_DEFINITION \
    -model-predict TO_PREDICT \
    -model-location MODEL_LOCATION \
    -model-CONFIGS CONFIG_VALUES \
    -sources f=PREDICT_DATA_SOURCE_TYPE \
    -source-CONFIGS PREDICTION_DATA \
    -log debug

Pre-Trained Models Available:

Type

Model

Entrypoint

Architecture

Classification

AlexNet

alexnet

AlexNet architecture

DenseNet-121

densenet121

DenseNet architecture

DenseNet-161

densenet161

DenseNet-169

densenet169

DenseNet-201

densenet201

MnasNet 0.5

mnasnet0_5

MnasNet architecture

MnasNet 1.0

mnasnet1_0

MobileNet V2

mobilenet_v2

MobileNet V2 architecture

VGG-11

vgg11

VGG-11 architecture Configuration “A”

VGG-11 with batch normalization

vgg11_bn

VGG-13

vgg13

VGG-13 architecture Configuration “B”

VGG-13 with batch normalization

vgg13_bn

VGG-16

vgg16

VGG-16 architecture Configuration “D”

VGG-16 with batch normalization

vgg16_bn

VGG-19

vgg19

VGG-19 architecture Configuration “E”

VGG-19 with batch normalization

vgg19_bn

GoogleNet

googlenet

GoogleNet architecture

Inception V3

inception_v3

Inception V3 architecture

ResNet-18

resnet18

ResNet architecture

ResNet-34

resnet34

ResNet-50

resnet50

ResNet-101

resnet101

ResNet-152

resnet152

Wide ResNet-101-2

wide_resnet101_2

Wide Resnet architecture

Wide ResNet-50-2

wide_resnet50_2

ShuffleNet V2 0.5

shufflenet_v2_x0_5

Shuffle Net V2 architecture

ShuffleNet V2 1.0

shufflenet_v2_x1_0

ResNext-101-32x8D

resnext101_32x8d

ResNext architecture

ResNext-50-32x4D

resnext50_32x4d

Usage Example:

Example below uses ResNet-18 Model using the command line.

Let us take a simple example: Classifying Ants and Bees Images

First, we download the dataset and verify with sha384sum

curl -LO https://download.pytorch.org/tutorial/hymenoptera_data.zip
sha384sum -c - << EOF
491db45cfcab02d99843fbdcf0574ecf99aa4f056d52c660a39248b5524f9e6e8f896d9faabd27ffcfc2eaca0cec6f39  /home/tron/Desktop/Development/hymenoptera_data.zip
EOF
hymenoptera_data.zip: OK

Unzip the file

unzip hymenoptera_data.zip

We first create a YAML file to define the last layer(s) to replace from the network architecture

layers.yaml

linear1:
  layer_type: Linear
  in_features: 512
  out_features: 256
relu:
  layer_type: ReLU
dropout:
  layer_type: Dropout
  p: 0.2
linear2:
  layer_type: Linear
  in_features: 256
  out_features: 2
logsoftmax:
  layer_type: LogSoftmax
  dim: 1

Train the model

dffml train \
  -model resnet18 \
  -model-add_layers \
  -model-layers @layers.yaml \
  -model-clstype str \
  -model-classifications ants bees \
  -model-location resnet18_model \
  -model-imageSize 224 \
  -model-epochs 5 \
  -model-batch_size 32 \
  -model-enableGPU \
  -model-features image:int:$((500*500)) \
  -model-predict label:str:1 \
  -sources f=dir \
    -source-foldername hymenoptera_data/train \
    -source-feature image \
    -source-labels ants bees \
  -log critical

Assess accuracy

dffml accuracy \
  -model resnet18 \
  -model-add_layers \
  -model-layers @layers.yaml \
  -model-clstype str \
  -model-classifications ants bees \
  -model-location resnet18_model \
  -model-imageSize 224 \
  -model-batch_size 32 \
  -model-enableGPU \
  -model-features image:int:$((500*500)) \
  -model-predict label:str:1 \
  -features label:str:1 \
  -sources f=dir \
    -source-foldername hymenoptera_data/val \
    -source-feature image \
    -source-labels ants bees \
  -scorer pytorchscore \
  -log critical

Output:

0.9215686274509803

Create a csv file with the names of the images to predict, whether they are ants or bees.

cat > unknown_images.csv << EOF
key,image
ants1,hymenoptera_data/val/ants/Ant-1818.jpg
bee1,hymenoptera_data/val//bees/10870992_eebeeb3a12.jpg
bee2,hymenoptera_data/val/bees/abeja.jpg
ants2,hymenoptera_data/val/ants/desert_ant.jpg
EOF

Make the predictions

dffml predict all \
  -model resnet18 \
  -model-add_layers \
  -model-layers @layers.yaml \
  -model-clstype str \
  -model-classifications ants bees \
  -model-location resnet18_model \
  -model-imageSize 224 \
  -model-enableGPU \
  -model-features image:int:$((500*500)) \
  -model-predict label:str:1 \
  -sources f=csv \
    -source-filename unknown_images.csv \
    -source-loadfiles image \
  -log critical \
  -pretty

Output:


	Key:	ants1
                                                               Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
|               image               |                    59, 66, 83, 60, 70, 87, 57, 72, 88, 53, 74, 89 ... (length:263250)                    |
+----------------------------------------------------------------------------------------------------------------------------------------------+

                                                                  Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                    label                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|            Value:  ants           |                                     Confidence:   0.9920881390571594                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+

	Key:	bee1
                                                               Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
|               image               |                    63, 114, 146, 63, 114, 146, 63, 114, 146, 63,  ... (length:696000)                    |
+----------------------------------------------------------------------------------------------------------------------------------------------+

                                                                  Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                    label                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|            Value:  bees           |                                     Confidence:   0.6108130216598511                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+

	Key:	bee2
                                                               Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
|               image               |                    103, 253, 254, 98, 254, 254, 91, 255, 254, 89, ... (length:359100)                    |
+----------------------------------------------------------------------------------------------------------------------------------------------+

                                                                  Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                    label                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|            Value:  bees           |                                     Confidence:   0.9162276387214661                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+

	Key:	ants2
                                                               Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
|               image               |                   69, 121, 162, 44, 96, 137, 41, 90, 130, 68, 11 ... (length:1563912)                    |
+----------------------------------------------------------------------------------------------------------------------------------------------+

                                                                  Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                    label                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|            Value:  ants           |                                     Confidence:   0.9368477463722229                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+

pytorchscore

Official

No description

dffml_model_tensorflow_hub

pip install dffml-model-tensorflow-hub

textclf

Official

No description

dffml_model_spacy

pip install dffml-model-spacy

sner

Official

No description