Scorers¶

Accuracy Scorers are implementations of dffml.accuracy.AccuracyScorer, they abstract the usage of scoring methods.

If you want to get started creating your own accuracy scorers, check out the Scorers. .. _plugin_accuracy_dffml:

dffml¶

pip install dffml

clf¶

Official

No description

mse¶

Official

No description

dffml_model_scratch¶

pip install dffml-model-scratch

anomalyscore¶

Official

No description

dffml_model_scikit¶

pip install dffml-model-scikit

Machine Learning models implemented with scikit-learn. Models are saved under the directory in subdirectories named after the hash of their feature names.

General Usage:

Training:

$ dffml train \
    -model SCIKIT_MODEL_ENTRYPOINT \
    -model-features FEATURE_DEFINITION \
    -model-predict TO_PREDICT \
    -model-location MODEL_DIRECTORY \
    -model-SCIKIT_PARAMETER_NAME SCIKIT_PARAMETER_VALUE \
    -sources f=TRAINING_DATA_SOURCE_TYPE \
    -source-filename TRAINING_DATA_FILE_NAME \
    -log debug

Testing and Accuracy:

$ dffml accuracy \
    -model SCIKIT_MODEL_ENTRYPOINT \
    -model-features FEATURE_DEFINITION \
    -model-predict TO_PREDICT \
    -model-location MODEL_DIRECTORY \
    -features TO_PREDICT \
    -sources f=TESTING_DATA_SOURCE_TYPE \
    -source-filename TESTING_DATA_FILE_NAME \
    -scorer ACCURACY_SCORER \
    -log debug

Predicting with trained model:

$ dffml predict all \
    -model SCIKIT_MODEL_ENTRYPOINT \
    -model-features FEATURE_DEFINITION \
    -model-predict TO_PREDICT \
    -model-location MODEL_DIRECTORY \
    -sources f=PREDICT_DATA_SOURCE_TYPE \
    -source-filename PREDICT_DATA_FILE_NAME \
    -log debug

Models Available:

Type	Model	Entrypoint	Parameters	Multi-Output
Regression	LinearRegression	scikitlr	scikitlr	Yes
	ElasticNet	scikiteln	scikiteln	Yes
	RandomForestRegressor	scikitrfr	scikitrfr	Yes
	BayesianRidge	scikitbyr	scikitbyr	Yes
	Lasso	scikitlas	scikitlas	Yes
	ARDRegression	scikitard	scikitard	Yes
	RANSACRegressor	scikitrsc	scikitrsc	Yes
	DecisionTreeRegressor	scikitdtr	scikitdtr	Yes
	GaussianProcessRegressor	scikitgpr	scikitgpr	Yes
	OrthogonalMatchingPursuit	scikitomp	scikitomp	Yes
	Lars	scikitlars	scikitlars	Yes
	Ridge	scikitridge	scikitridge	Yes
Classification	KNeighborsClassifier	scikitknn	scikitknn	Yes
	AdaBoostClassifier	scikitadaboost	scikitadaboost	Yes
	GaussianProcessClassifier	scikitgpc	scikitgpc	Yes
	DecisionTreeClassifier	scikitdtc	scikitdtc	Yes
	RandomForestClassifier	scikitrfc	scikitrfc	Yes
	QuadraticDiscriminantAnalysis	scikitqda	scikitqda	Yes
	MLPClassifier	scikitmlp	scikitmlp	Yes
	GaussianNB	scikitgnb	scikitgnb	Yes
	SVC	scikitsvc	scikitsvc	Yes
	LogisticRegression	scikitlor	scikitlor	Yes
	GradientBoostingClassifier	scikitgbc	scikitgbc	Yes
	BernoulliNB	scikitbnb	scikitbnb	Yes
	ExtraTreesClassifier	scikitetc	scikitetc	Yes
	BaggingClassifier	scikitbgc	scikitbgc	Yes
	LinearDiscriminantAnalysis	scikitlda	scikitlda	Yes
	MultinomialNB	scikitmnb	scikitmnb	Yes
Clustering	KMeans	scikitkmeans	scikitkmeans	No
	Birch	scikitbirch	scikitbirch	No
	MiniBatchKMeans	scikitmbkmeans	scikitmbkmeans	No
	AffinityPropagation	scikitap	scikitap	No
	MeanShift	scikitms	scikitms	No
	SpectralClustering	scikitsc	scikitsc	No
	AgglomerativeClustering	scikitac	scikitac	No
	OPTICS	scikitoptics	scikitoptics	No

Scorers Available:

Type	Scorer	Entrypoint	Parameters	Multi-Output
Regression	Explained Variance Score	exvscore	exvscore	Yes
	Max Error	maxerr	maxerr	No
	Mean Absolute Error	meanabserr	meanabserr	Yes
	Mean Squared Error	meansqrerr	meansqrerr	Yes
	Mean Squared Log Error	meansqrlogerr	meansqrlogerr	Yes
	Median Absolute Error	medabserr	medabserr	Yes
	R2 Score	r2score	r2score	Yes
	Mean Poisson Deviance	meanpoidev	meanpoidev	No
	Mean Gamma Deviance	meangammadev	meangammadev	No
	Mean Absolute Percentage Error	meanabspererr	meanabspererr	Yes
Classification	Accuracy Score	acscore	acscore	Yes
	Balanced Accuracy Score	bacscore	bacscore	Yes
	Top K Accuracy Score	topkscore	topkscore	Yes
	Average Precision Score	avgprescore	avgprescore	Yes
	Brier Score Loss	brierscore	brierscore	Yes
	F1 Score	f1score	f1score	Yes
	Log Loss	logloss	logloss	Yes
	Precision Score	prescore	prescore	Yes
	Recall Score	recallscore	recallscore	Yes
	Jaccard Score	jacscore	jacscore	Yes
	Roc Auc Score	rocaucscore	rocaucscore	Yes
Clustering	Adjusted Mutual Info Score	adjmutinfoscore	adjmutinfoscore	No
	Adjusted Rand Score	adjrandscore	adjrandscore	No
	Completeness Score	complscore	complscore	No
	Fowlkes Mallows Score	fowlmalscore	fowlmalscore	No
	Homogeneity Score	homoscore	homoscore	No
	Mutual Info Score	mutinfoscore	mutinfoscore	No
	Normalized Mutual Info Score	normmutinfoscore	normmutinfoscore	No
	Rand Score	randscore	randscore	No
	V Measure Score	vmscore	vmscore	No
Supervised	Model’s Default Score	skmodelscore	skmodelscore	Yes

Usage Example:

Example below uses LinearRegression Model using the command line.

Let us take a simple example:

Years of Experience	Expertise	Trust Factor	Salary
0	01	0.2	10
1	03	0.4	20
2	05	0.6	30
3	07	0.8	40
4	09	1.0	50
5	11	1.2	60

First we create the files

cat > train.csv << EOF
Years,Expertise,Trust,Salary
0,1,0.1,10
1,3,0.2,20
2,5,0.3,30
3,7,0.4,40
EOF

cat > test.csv << EOF
Years,Expertise,Trust,Salary
4,9,0.5,50
5,11,0.6,60
EOF

Train the model

dffml train \
  -model scikitlr \
  -model-features Years:int:1 Expertise:int:1 Trust:float:1 \
  -model-predict Salary:float:1 \
  -model-location tempdir \
  -sources f=csv \
  -source-filename train.csv

Assess accuracy

dffml accuracy \
  -model scikitlr \
  -model-features Years:int:1 Expertise:int:1 Trust:float:1 \
  -model-predict Salary:float:1 \
  -model-location tempdir \
  -features Salary:float:1 \
  -scorer mse \
  -sources f=csv \
  -source-filename test.csv

Output:

1.0

Make a prediction

echo -e 'Years,Expertise,Trust\n6,13,0.7\n' | \
dffml predict all \
  -model scikitlr \
  -model-features Years:int:1 Expertise:int:1 Trust:float:1 \
  -model-predict Salary:float:1 \
  -model-location tempdir \
  -sources f=csv \
  -source-filename /dev/stdin

Output:

[
    {
        "extra": {},
        "features": {
            "Expertise": 13,
            "Trust": 0.7,
            "Years": 6
        },
        "key": "0",
        "last_updated": "2020-03-01T22:26:46Z",
        "prediction": {
            "Salary": {
                "confidence": 1.0,
                "value": 70.0
            }
        }
    }
]

Example usage of Linear Regression Model using python API:

from dffml import CSVSource, Features, Feature
from dffml.noasync import train, score, predict
from dffml_model_scikit import LinearRegressionModel
from dffml.accuracy import MeanSquaredErrorAccuracy

model = LinearRegressionModel(
    features=Features(
        Feature("Years", int, 1),
        Feature("Expertise", int, 1),
        Feature("Trust", float, 1),
    ),
    predict=Feature("Salary", int, 1),
    location="tempdir",
)

# Train the model
train(model, "train.csv")

# Assess accuracy (alternate way of specifying data source)
scorer = MeanSquaredErrorAccuracy()
print(
    "Accuracy:",
    score(
        model,
        scorer,
        Feature("Salary", int, 1),
        CSVSource(filename="test.csv"),
    ),
)

# Make prediction
for i, features, prediction in predict(
    model,
    {"Years": 6, "Expertise": 13, "Trust": 0.7},
    {"Years": 7, "Expertise": 15, "Trust": 0.8},
):
    features["Salary"] = prediction["Salary"]["value"]
    print(features)

Example below uses KMeans Clustering Model on a small randomly generated dataset.

 $ cat > train.csv << EOF
Col1,          Col2,        Col3,         Col4
5.05776417,   8.55128116,   6.15193196,  -8.67349666
3.48864265,  -7.25952218,  -4.89216256,   4.69308946
-8.16207603,  5.16792984,  -2.66971993,   0.2401882
6.09809669,   8.36434181,   6.70940915,  -7.91491768
-9.39122566,  5.39133807,  -2.29760281,  -1.69672981
0.48311336,   8.19998973,   7.78641979,   7.8843821
2.22409135,  -7.73598586,  -4.02660224,   2.82101794
2.8137247 ,   8.36064298,   7.66196849,   3.12704676
EOF
 $ cat > test.csv << EOF
Col1,             Col2,          Col3,         Col4,    cluster
-10.16770144,   2.73057215,  -1.49351481,   2.43005691,    6
3.59705381,  -4.76520663,  -3.34916068,   5.72391486,     1
4.01612313,  -4.641852  ,  -4.77333308,   5.87551683,     0
EOF
 $ dffml train \
     -model scikitkmeans \
     -model-features Col1:float:1 Col2:float:1 Col3:float:1 Col4:float:1 \
     -model-location tempdir \
     -sources f=csv \
     -source-filename train.csv \
     -source-readonly \
     -log debug
 $ dffml accuracy \
     -model scikitkmeans \
     -model-features Col1:float:1 Col2:float:1 Col3:float:1 Col4:float:1\
     -model-predict cluster:int:1 \
     -model-location tempdir \
     -features cluster:int:1 \
     -sources f=csv \
     -source-filename test.csv \
     -source-readonly \
     -scorer skmodelscore \
     -log debug
 0.6365141682948129
 $ echo -e 'Col1,Col2,Col3,Col4\n6.09809669,8.36434181,6.70940915,-7.91491768\n' | \
   dffml predict all \
     -model scikitkmeans \
     -model-features Col1:float:1 Col2:float:1 Col3:float:1 Col4:float:1 \
     -model-location tempdir \
     -sources f=csv \
     -source-filename /dev/stdin \
     -source-readonly \
     -log debug
 [
     {
         "extra": {},
         "features": {
             "Col1": 6.09809669,
             "Col2": 8.36434181,
             "Col3": 6.70940915,
             "Col4": -7.91491768
         },
         "last_updated": "2020-01-12T22:51:15Z",
         "prediction": {
             "confidence": 0.6365141682948129,
             "value": 2
         },
         "key": "0"
     }
 ]

Example usage of KMeans Clustering Model using python API:

from dffml import CSVSource, Features, Feature
from dffml.noasync import train, score, predict
from dffml_model_scikit import KMeansModel
from dffml_model_scikit import MutualInfoScoreScorer

model = KMeansModel(
    features=Features(
        Feature("Col1", float, 1),
        Feature("Col2", float, 1),
        Feature("Col3", float, 1),
        Feature("Col4", float, 1),
    ),
    predict=Feature("cluster", int, 1),
    location="tempdir",
)

# Train the model
train(model, "train.csv")

# Assess accuracy (alternate way of specifying data source)
scorer = MutualInfoScoreScorer()
print("Accuracy:", score(model, scorer, Feature("cluster", int, 1), CSVSource(filename="test.csv")))

# Make prediction
for i, features, prediction in predict(
    model,
    {"Col1": 6.09809669, "Col2": 8.36434181, "Col3": 6.70940915, "Col4": -7.91491768},
):
    features["cluster"] = prediction["cluster"]["value"]
    print(features)

NOTE: Transductive Clusterers(scikitsc, scikitac, scikitoptics) cannot handle unseen data. Ensure that predict and accuracy for these algorithms uses training data.

Args

predict: Feature
- Label or the value to be predicted
- Only used by classification and regression models
features: List of features
- Features to train on
location: Path
- Location where state should be saved

dffml_model_pytorch¶

pip install dffml-model-pytorch

Machine Learning models implemented with PyTorch. Models are saved under the directory in model.pt.

General Usage:

Training:

$ dffml train \
    -model PYTORCH_MODEL_ENTRYPOINT \
    -model-features FEATURE_DEFINITION \
    -model-predict TO_PREDICT \
    -model-location MODEL_LOCATION \
    -model-CONFIGS CONFIG_VALUES \
    -sources f=TRAINING_DATA_SOURCE_TYPE \
    -source-CONFIGS TRAINING_DATA \
    -log debug

Testing and Accuracy:

$ dffml accuracy \
    -model PYTORCH_MODEL_ENTRYPOINT \
    -model-features FEATURE_DEFINITION \
    -model-predict TO_PREDICT \
    -model-location MODEL_LOCATION \
    -model-CONFIGS CONFIG_VALUES \
    -features TO_PREDICT \
    -sources f=TESTING_DATA_SOURCE_TYPE \
    -source-CONFIGS TESTING_DATA \
    -log debug

Predicting with trained model:

$ dffml predict all \
    -model PYTORCH_MODEL_ENTRYPOINT \
    -model-features FEATURE_DEFINITION \
    -model-predict TO_PREDICT \
    -model-location MODEL_LOCATION \
    -model-CONFIGS CONFIG_VALUES \
    -sources f=PREDICT_DATA_SOURCE_TYPE \
    -source-CONFIGS PREDICTION_DATA \
    -log debug

Pre-Trained Models Available:

Type	Model	Entrypoint	Architecture
Classification	AlexNet	alexnet	AlexNet architecture
	DenseNet-121	densenet121	DenseNet architecture
	DenseNet-161	densenet161
	DenseNet-169	densenet169
	DenseNet-201	densenet201
	MnasNet 0.5	mnasnet0_5	MnasNet architecture
	MnasNet 1.0	mnasnet1_0
	MobileNet V2	mobilenet_v2	MobileNet V2 architecture
	VGG-11	vgg11	VGG-11 architecture Configuration “A”
	VGG-11 with batch normalization	vgg11_bn
	VGG-13	vgg13	VGG-13 architecture Configuration “B”
	VGG-13 with batch normalization	vgg13_bn
	VGG-16	vgg16	VGG-16 architecture Configuration “D”
	VGG-16 with batch normalization	vgg16_bn
	VGG-19	vgg19	VGG-19 architecture Configuration “E”
	VGG-19 with batch normalization	vgg19_bn
	GoogleNet	googlenet	GoogleNet architecture
	Inception V3	inception_v3	Inception V3 architecture
	ResNet-18	resnet18	ResNet architecture
	ResNet-34	resnet34
	ResNet-50	resnet50
	ResNet-101	resnet101
	ResNet-152	resnet152
	Wide ResNet-101-2	wide_resnet101_2	Wide Resnet architecture
	Wide ResNet-50-2	wide_resnet50_2
	ShuffleNet V2 0.5	shufflenet_v2_x0_5	Shuffle Net V2 architecture
	ShuffleNet V2 1.0	shufflenet_v2_x1_0
	ResNext-101-32x8D	resnext101_32x8d	ResNext architecture
	ResNext-50-32x4D	resnext50_32x4d

Usage Example:

Example below uses ResNet-18 Model using the command line.

Let us take a simple example: Classifying Ants and Bees Images

First, we download the dataset and verify with sha384sum

curl -LO https://download.pytorch.org/tutorial/hymenoptera_data.zip
sha384sum -c - << EOF
491db45cfcab02d99843fbdcf0574ecf99aa4f056d52c660a39248b5524f9e6e8f896d9faabd27ffcfc2eaca0cec6f39  /home/tron/Desktop/Development/hymenoptera_data.zip
EOF
hymenoptera_data.zip: OK

Unzip the file

unzip hymenoptera_data.zip

We first create a YAML file to define the last layer(s) to replace from the network architecture

layers.yaml

linear1:
  layer_type: Linear
  in_features: 512
  out_features: 256
relu:
  layer_type: ReLU
dropout:
  layer_type: Dropout
  p: 0.2
linear2:
  layer_type: Linear
  in_features: 256
  out_features: 2
logsoftmax:
  layer_type: LogSoftmax
  dim: 1

Train the model

dffml train \
  -model resnet18 \
  -model-add_layers \
  -model-layers @layers.yaml \
  -model-clstype str \
  -model-classifications ants bees \
  -model-location resnet18_model \
  -model-imageSize 224 \
  -model-epochs 5 \
  -model-batch_size 32 \
  -model-enableGPU \
  -model-features image:int:$((500*500)) \
  -model-predict label:str:1 \
  -sources f=dir \
    -source-foldername hymenoptera_data/train \
    -source-feature image \
    -source-labels ants bees \
  -log critical

Assess accuracy

dffml accuracy \
  -model resnet18 \
  -model-add_layers \
  -model-layers @layers.yaml \
  -model-clstype str \
  -model-classifications ants bees \
  -model-location resnet18_model \
  -model-imageSize 224 \
  -model-batch_size 32 \
  -model-enableGPU \
  -model-features image:int:$((500*500)) \
  -model-predict label:str:1 \
  -features label:str:1 \
  -sources f=dir \
    -source-foldername hymenoptera_data/val \
    -source-feature image \
    -source-labels ants bees \
  -scorer pytorchscore \
  -log critical

Output:

0.9215686274509803

Create a csv file with the names of the images to predict, whether they are ants or bees.

cat > unknown_images.csv << EOF
key,image
ants1,hymenoptera_data/val/ants/Ant-1818.jpg
bee1,hymenoptera_data/val//bees/10870992_eebeeb3a12.jpg
bee2,hymenoptera_data/val/bees/abeja.jpg
ants2,hymenoptera_data/val/ants/desert_ant.jpg
EOF

Make the predictions

dffml predict all \
  -model resnet18 \
  -model-add_layers \
  -model-layers @layers.yaml \
  -model-clstype str \
  -model-classifications ants bees \
  -model-location resnet18_model \
  -model-imageSize 224 \
  -model-enableGPU \
  -model-features image:int:$((500*500)) \
  -model-predict label:str:1 \
  -sources f=csv \
    -source-filename unknown_images.csv \
    -source-loadfiles image \
  -log critical \
  -pretty

Output:

	Key:	ants1
                                                               Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
|               image               |                    59, 66, 83, 60, 70, 87, 57, 72, 88, 53, 74, 89 ... (length:263250)                    |
+----------------------------------------------------------------------------------------------------------------------------------------------+

                                                                  Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                    label                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|            Value:  ants           |                                     Confidence:   0.9920881390571594                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+

	Key:	bee1
                                                               Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
|               image               |                    63, 114, 146, 63, 114, 146, 63, 114, 146, 63,  ... (length:696000)                    |
+----------------------------------------------------------------------------------------------------------------------------------------------+

                                                                  Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                    label                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|            Value:  bees           |                                     Confidence:   0.6108130216598511                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+

	Key:	bee2
                                                               Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
|               image               |                    103, 253, 254, 98, 254, 254, 91, 255, 254, 89, ... (length:359100)                    |
+----------------------------------------------------------------------------------------------------------------------------------------------+

                                                                  Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                    label                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|            Value:  bees           |                                     Confidence:   0.9162276387214661                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+

	Key:	ants2
                                                               Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
|               image               |                   69, 121, 162, 44, 96, 137, 41, 90, 130, 68, 11 ... (length:1563912)                    |
+----------------------------------------------------------------------------------------------------------------------------------------------+

                                                                  Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                    label                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|            Value:  ants           |                                     Confidence:   0.9368477463722229                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+

pytorchscore¶

Official

No description

dffml_model_tensorflow_hub¶

pip install dffml-model-tensorflow-hub

textclf¶

Official

No description

dffml_model_spacy¶

pip install dffml-model-spacy

sner¶

Official

No description