Scorers¶
Accuracy Scorers are implementations of dffml.accuracy.AccuracyScorer
,
they abstract the usage of scoring methods.
If you want to get started creating your own accuracy scorers, check out the Scorers. .. _plugin_accuracy_dffml:
dffml¶
pip install dffml
clf¶
Official
No description
mse¶
Official
No description
dffml_model_scratch¶
pip install dffml-model-scratch
anomalyscore¶
Official
No description
dffml_model_scikit¶
pip install dffml-model-scikit
Machine Learning models implemented with scikit-learn. Models are saved under the directory in subdirectories named after the hash of their feature names.
General Usage:
Training:
$ dffml train \
-model SCIKIT_MODEL_ENTRYPOINT \
-model-features FEATURE_DEFINITION \
-model-predict TO_PREDICT \
-model-location MODEL_DIRECTORY \
-model-SCIKIT_PARAMETER_NAME SCIKIT_PARAMETER_VALUE \
-sources f=TRAINING_DATA_SOURCE_TYPE \
-source-filename TRAINING_DATA_FILE_NAME \
-log debug
Testing and Accuracy:
$ dffml accuracy \
-model SCIKIT_MODEL_ENTRYPOINT \
-model-features FEATURE_DEFINITION \
-model-predict TO_PREDICT \
-model-location MODEL_DIRECTORY \
-features TO_PREDICT \
-sources f=TESTING_DATA_SOURCE_TYPE \
-source-filename TESTING_DATA_FILE_NAME \
-scorer ACCURACY_SCORER \
-log debug
Predicting with trained model:
$ dffml predict all \
-model SCIKIT_MODEL_ENTRYPOINT \
-model-features FEATURE_DEFINITION \
-model-predict TO_PREDICT \
-model-location MODEL_DIRECTORY \
-sources f=PREDICT_DATA_SOURCE_TYPE \
-source-filename PREDICT_DATA_FILE_NAME \
-log debug
Models Available:
Type |
Model |
Entrypoint |
Parameters |
Multi-Output |
---|---|---|---|---|
Regression |
LinearRegression |
scikitlr |
Yes |
|
ElasticNet |
scikiteln |
Yes |
||
RandomForestRegressor |
scikitrfr |
Yes |
||
BayesianRidge |
scikitbyr |
Yes |
||
Lasso |
scikitlas |
Yes |
||
ARDRegression |
scikitard |
Yes |
||
RANSACRegressor |
scikitrsc |
Yes |
||
DecisionTreeRegressor |
scikitdtr |
Yes |
||
GaussianProcessRegressor |
scikitgpr |
Yes |
||
OrthogonalMatchingPursuit |
scikitomp |
Yes |
||
Lars |
scikitlars |
Yes |
||
Ridge |
scikitridge |
Yes |
||
Classification |
KNeighborsClassifier |
scikitknn |
Yes |
|
AdaBoostClassifier |
scikitadaboost |
Yes |
||
GaussianProcessClassifier |
scikitgpc |
Yes |
||
DecisionTreeClassifier |
scikitdtc |
Yes |
||
RandomForestClassifier |
scikitrfc |
Yes |
||
QuadraticDiscriminantAnalysis |
scikitqda |
Yes |
||
MLPClassifier |
scikitmlp |
Yes |
||
GaussianNB |
scikitgnb |
Yes |
||
SVC |
scikitsvc |
Yes |
||
LogisticRegression |
scikitlor |
Yes |
||
GradientBoostingClassifier |
scikitgbc |
Yes |
||
BernoulliNB |
scikitbnb |
Yes |
||
ExtraTreesClassifier |
scikitetc |
Yes |
||
BaggingClassifier |
scikitbgc |
Yes |
||
LinearDiscriminantAnalysis |
scikitlda |
Yes |
||
MultinomialNB |
scikitmnb |
Yes |
||
Clustering |
KMeans |
scikitkmeans |
No |
|
Birch |
scikitbirch |
No |
||
MiniBatchKMeans |
scikitmbkmeans |
No |
||
AffinityPropagation |
scikitap |
No |
||
MeanShift |
scikitms |
No |
||
SpectralClustering |
scikitsc |
No |
||
AgglomerativeClustering |
scikitac |
No |
||
OPTICS |
scikitoptics |
No |
Scorers Available:
Type |
Scorer |
Entrypoint |
Parameters |
Multi-Output |
---|---|---|---|---|
Regression |
Explained Variance Score |
exvscore |
Yes |
|
Max Error |
maxerr |
No |
||
Mean Absolute Error |
meanabserr |
Yes |
||
Mean Squared Error |
meansqrerr |
Yes |
||
Mean Squared Log Error |
meansqrlogerr |
Yes |
||
Median Absolute Error |
medabserr |
Yes |
||
R2 Score |
r2score |
Yes |
||
Mean Poisson Deviance |
meanpoidev |
No |
||
Mean Gamma Deviance |
meangammadev |
No |
||
Mean Absolute Percentage Error |
meanabspererr |
Yes |
||
Classification |
Accuracy Score |
acscore |
Yes |
|
Balanced Accuracy Score |
bacscore |
Yes |
||
Top K Accuracy Score |
topkscore |
Yes |
||
Average Precision Score |
avgprescore |
Yes |
||
Brier Score Loss |
brierscore |
Yes |
||
F1 Score |
f1score |
Yes |
||
Log Loss |
logloss |
Yes |
||
Precision Score |
prescore |
Yes |
||
Recall Score |
recallscore |
Yes |
||
Jaccard Score |
jacscore |
Yes |
||
Roc Auc Score |
rocaucscore |
Yes |
||
Clustering |
Adjusted Mutual Info Score |
adjmutinfoscore |
No |
|
Adjusted Rand Score |
adjrandscore |
No |
||
Completeness Score |
complscore |
No |
||
Fowlkes Mallows Score |
fowlmalscore |
No |
||
Homogeneity Score |
homoscore |
No |
||
Mutual Info Score |
mutinfoscore |
No |
||
Normalized Mutual Info Score |
normmutinfoscore |
No |
||
Rand Score |
randscore |
No |
||
V Measure Score |
vmscore |
No |
||
Supervised |
Model’s Default Score |
skmodelscore |
Yes |
Usage Example:
Example below uses LinearRegression Model using the command line.
Let us take a simple example:
Years of Experience |
Expertise |
Trust Factor |
Salary |
---|---|---|---|
0 |
01 |
0.2 |
10 |
1 |
03 |
0.4 |
20 |
2 |
05 |
0.6 |
30 |
3 |
07 |
0.8 |
40 |
4 |
09 |
1.0 |
50 |
5 |
11 |
1.2 |
60 |
First we create the files
cat > train.csv << EOF
Years,Expertise,Trust,Salary
0,1,0.1,10
1,3,0.2,20
2,5,0.3,30
3,7,0.4,40
EOF
cat > test.csv << EOF
Years,Expertise,Trust,Salary
4,9,0.5,50
5,11,0.6,60
EOF
Train the model
dffml train \
-model scikitlr \
-model-features Years:int:1 Expertise:int:1 Trust:float:1 \
-model-predict Salary:float:1 \
-model-location tempdir \
-sources f=csv \
-source-filename train.csv
Assess accuracy
dffml accuracy \
-model scikitlr \
-model-features Years:int:1 Expertise:int:1 Trust:float:1 \
-model-predict Salary:float:1 \
-model-location tempdir \
-features Salary:float:1 \
-scorer mse \
-sources f=csv \
-source-filename test.csv
Output:
1.0
Make a prediction
echo -e 'Years,Expertise,Trust\n6,13,0.7\n' | \
dffml predict all \
-model scikitlr \
-model-features Years:int:1 Expertise:int:1 Trust:float:1 \
-model-predict Salary:float:1 \
-model-location tempdir \
-sources f=csv \
-source-filename /dev/stdin
Output:
[
{
"extra": {},
"features": {
"Expertise": 13,
"Trust": 0.7,
"Years": 6
},
"key": "0",
"last_updated": "2020-03-01T22:26:46Z",
"prediction": {
"Salary": {
"confidence": 1.0,
"value": 70.0
}
}
}
]
Example usage of Linear Regression Model using python API:
from dffml import CSVSource, Features, Feature
from dffml.noasync import train, score, predict
from dffml_model_scikit import LinearRegressionModel
from dffml.accuracy import MeanSquaredErrorAccuracy
model = LinearRegressionModel(
features=Features(
Feature("Years", int, 1),
Feature("Expertise", int, 1),
Feature("Trust", float, 1),
),
predict=Feature("Salary", int, 1),
location="tempdir",
)
# Train the model
train(model, "train.csv")
# Assess accuracy (alternate way of specifying data source)
scorer = MeanSquaredErrorAccuracy()
print(
"Accuracy:",
score(
model,
scorer,
Feature("Salary", int, 1),
CSVSource(filename="test.csv"),
),
)
# Make prediction
for i, features, prediction in predict(
model,
{"Years": 6, "Expertise": 13, "Trust": 0.7},
{"Years": 7, "Expertise": 15, "Trust": 0.8},
):
features["Salary"] = prediction["Salary"]["value"]
print(features)
Example below uses KMeans Clustering Model on a small randomly generated dataset.
$ cat > train.csv << EOF
Col1, Col2, Col3, Col4
5.05776417, 8.55128116, 6.15193196, -8.67349666
3.48864265, -7.25952218, -4.89216256, 4.69308946
-8.16207603, 5.16792984, -2.66971993, 0.2401882
6.09809669, 8.36434181, 6.70940915, -7.91491768
-9.39122566, 5.39133807, -2.29760281, -1.69672981
0.48311336, 8.19998973, 7.78641979, 7.8843821
2.22409135, -7.73598586, -4.02660224, 2.82101794
2.8137247 , 8.36064298, 7.66196849, 3.12704676
EOF
$ cat > test.csv << EOF
Col1, Col2, Col3, Col4, cluster
-10.16770144, 2.73057215, -1.49351481, 2.43005691, 6
3.59705381, -4.76520663, -3.34916068, 5.72391486, 1
4.01612313, -4.641852 , -4.77333308, 5.87551683, 0
EOF
$ dffml train \
-model scikitkmeans \
-model-features Col1:float:1 Col2:float:1 Col3:float:1 Col4:float:1 \
-model-location tempdir \
-sources f=csv \
-source-filename train.csv \
-source-readonly \
-log debug
$ dffml accuracy \
-model scikitkmeans \
-model-features Col1:float:1 Col2:float:1 Col3:float:1 Col4:float:1\
-model-predict cluster:int:1 \
-model-location tempdir \
-features cluster:int:1 \
-sources f=csv \
-source-filename test.csv \
-source-readonly \
-scorer skmodelscore \
-log debug
0.6365141682948129
$ echo -e 'Col1,Col2,Col3,Col4\n6.09809669,8.36434181,6.70940915,-7.91491768\n' | \
dffml predict all \
-model scikitkmeans \
-model-features Col1:float:1 Col2:float:1 Col3:float:1 Col4:float:1 \
-model-location tempdir \
-sources f=csv \
-source-filename /dev/stdin \
-source-readonly \
-log debug
[
{
"extra": {},
"features": {
"Col1": 6.09809669,
"Col2": 8.36434181,
"Col3": 6.70940915,
"Col4": -7.91491768
},
"last_updated": "2020-01-12T22:51:15Z",
"prediction": {
"confidence": 0.6365141682948129,
"value": 2
},
"key": "0"
}
]
Example usage of KMeans Clustering Model using python API:
from dffml import CSVSource, Features, Feature
from dffml.noasync import train, score, predict
from dffml_model_scikit import KMeansModel
from dffml_model_scikit import MutualInfoScoreScorer
model = KMeansModel(
features=Features(
Feature("Col1", float, 1),
Feature("Col2", float, 1),
Feature("Col3", float, 1),
Feature("Col4", float, 1),
),
predict=Feature("cluster", int, 1),
location="tempdir",
)
# Train the model
train(model, "train.csv")
# Assess accuracy (alternate way of specifying data source)
scorer = MutualInfoScoreScorer()
print("Accuracy:", score(model, scorer, Feature("cluster", int, 1), CSVSource(filename="test.csv")))
# Make prediction
for i, features, prediction in predict(
model,
{"Col1": 6.09809669, "Col2": 8.36434181, "Col3": 6.70940915, "Col4": -7.91491768},
):
features["cluster"] = prediction["cluster"]["value"]
print(features)
NOTE: Transductive Clusterers(scikitsc, scikitac, scikitoptics) cannot handle unseen data. Ensure that predict and accuracy for these algorithms uses training data.
Args
predict: Feature
Label or the value to be predicted
Only used by classification and regression models
features: List of features
Features to train on
location: Path
Location where state should be saved
dffml_model_pytorch¶
pip install dffml-model-pytorch
Machine Learning models implemented with PyTorch. Models are saved under the directory in model.pt.
General Usage:
Training:
$ dffml train \
-model PYTORCH_MODEL_ENTRYPOINT \
-model-features FEATURE_DEFINITION \
-model-predict TO_PREDICT \
-model-location MODEL_LOCATION \
-model-CONFIGS CONFIG_VALUES \
-sources f=TRAINING_DATA_SOURCE_TYPE \
-source-CONFIGS TRAINING_DATA \
-log debug
Testing and Accuracy:
$ dffml accuracy \
-model PYTORCH_MODEL_ENTRYPOINT \
-model-features FEATURE_DEFINITION \
-model-predict TO_PREDICT \
-model-location MODEL_LOCATION \
-model-CONFIGS CONFIG_VALUES \
-features TO_PREDICT \
-sources f=TESTING_DATA_SOURCE_TYPE \
-source-CONFIGS TESTING_DATA \
-log debug
Predicting with trained model:
$ dffml predict all \
-model PYTORCH_MODEL_ENTRYPOINT \
-model-features FEATURE_DEFINITION \
-model-predict TO_PREDICT \
-model-location MODEL_LOCATION \
-model-CONFIGS CONFIG_VALUES \
-sources f=PREDICT_DATA_SOURCE_TYPE \
-source-CONFIGS PREDICTION_DATA \
-log debug
Pre-Trained Models Available:
Type |
Model |
Entrypoint |
Architecture |
---|---|---|---|
Classification |
AlexNet |
alexnet |
|
DenseNet-121 |
densenet121 |
||
DenseNet-161 |
densenet161 |
||
DenseNet-169 |
densenet169 |
||
DenseNet-201 |
densenet201 |
||
MnasNet 0.5 |
mnasnet0_5 |
||
MnasNet 1.0 |
mnasnet1_0 |
||
MobileNet V2 |
mobilenet_v2 |
||
VGG-11 |
vgg11 |
||
VGG-11 with batch normalization |
vgg11_bn |
||
VGG-13 |
vgg13 |
||
VGG-13 with batch normalization |
vgg13_bn |
||
VGG-16 |
vgg16 |
||
VGG-16 with batch normalization |
vgg16_bn |
||
VGG-19 |
vgg19 |
||
VGG-19 with batch normalization |
vgg19_bn |
||
GoogleNet |
googlenet |
||
Inception V3 |
inception_v3 |
||
ResNet-18 |
resnet18 |
||
ResNet-34 |
resnet34 |
||
ResNet-50 |
resnet50 |
||
ResNet-101 |
resnet101 |
||
ResNet-152 |
resnet152 |
||
Wide ResNet-101-2 |
wide_resnet101_2 |
||
Wide ResNet-50-2 |
wide_resnet50_2 |
||
ShuffleNet V2 0.5 |
shufflenet_v2_x0_5 |
||
ShuffleNet V2 1.0 |
shufflenet_v2_x1_0 |
||
ResNext-101-32x8D |
resnext101_32x8d |
||
ResNext-50-32x4D |
resnext50_32x4d |
Usage Example:
Example below uses ResNet-18 Model using the command line.
Let us take a simple example: Classifying Ants and Bees Images
First, we download the dataset and verify with sha384sum
curl -LO https://download.pytorch.org/tutorial/hymenoptera_data.zip
sha384sum -c - << EOF
491db45cfcab02d99843fbdcf0574ecf99aa4f056d52c660a39248b5524f9e6e8f896d9faabd27ffcfc2eaca0cec6f39 /home/tron/Desktop/Development/hymenoptera_data.zip
EOF
hymenoptera_data.zip: OK
Unzip the file
unzip hymenoptera_data.zip
We first create a YAML file to define the last layer(s) to replace from the network architecture
layers.yaml
linear1:
layer_type: Linear
in_features: 512
out_features: 256
relu:
layer_type: ReLU
dropout:
layer_type: Dropout
p: 0.2
linear2:
layer_type: Linear
in_features: 256
out_features: 2
logsoftmax:
layer_type: LogSoftmax
dim: 1
Train the model
dffml train \
-model resnet18 \
-model-add_layers \
-model-layers @layers.yaml \
-model-clstype str \
-model-classifications ants bees \
-model-location resnet18_model \
-model-imageSize 224 \
-model-epochs 5 \
-model-batch_size 32 \
-model-enableGPU \
-model-features image:int:$((500*500)) \
-model-predict label:str:1 \
-sources f=dir \
-source-foldername hymenoptera_data/train \
-source-feature image \
-source-labels ants bees \
-log critical
Assess accuracy
dffml accuracy \
-model resnet18 \
-model-add_layers \
-model-layers @layers.yaml \
-model-clstype str \
-model-classifications ants bees \
-model-location resnet18_model \
-model-imageSize 224 \
-model-batch_size 32 \
-model-enableGPU \
-model-features image:int:$((500*500)) \
-model-predict label:str:1 \
-features label:str:1 \
-sources f=dir \
-source-foldername hymenoptera_data/val \
-source-feature image \
-source-labels ants bees \
-scorer pytorchscore \
-log critical
Output:
0.9215686274509803
Create a csv file with the names of the images to predict, whether they are ants or bees.
cat > unknown_images.csv << EOF
key,image
ants1,hymenoptera_data/val/ants/Ant-1818.jpg
bee1,hymenoptera_data/val//bees/10870992_eebeeb3a12.jpg
bee2,hymenoptera_data/val/bees/abeja.jpg
ants2,hymenoptera_data/val/ants/desert_ant.jpg
EOF
Make the predictions
dffml predict all \
-model resnet18 \
-model-add_layers \
-model-layers @layers.yaml \
-model-clstype str \
-model-classifications ants bees \
-model-location resnet18_model \
-model-imageSize 224 \
-model-enableGPU \
-model-features image:int:$((500*500)) \
-model-predict label:str:1 \
-sources f=csv \
-source-filename unknown_images.csv \
-source-loadfiles image \
-log critical \
-pretty
Output:
Key: ants1
Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
| image | 59, 66, 83, 60, 70, 87, 57, 72, 88, 53, 74, 89 ... (length:263250) |
+----------------------------------------------------------------------------------------------------------------------------------------------+
Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
| label |
+----------------------------------------------------------------------------------------------------------------------------------------------+
| Value: ants | Confidence: 0.9920881390571594 |
+----------------------------------------------------------------------------------------------------------------------------------------------+
Key: bee1
Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
| image | 63, 114, 146, 63, 114, 146, 63, 114, 146, 63, ... (length:696000) |
+----------------------------------------------------------------------------------------------------------------------------------------------+
Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
| label |
+----------------------------------------------------------------------------------------------------------------------------------------------+
| Value: bees | Confidence: 0.6108130216598511 |
+----------------------------------------------------------------------------------------------------------------------------------------------+
Key: bee2
Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
| image | 103, 253, 254, 98, 254, 254, 91, 255, 254, 89, ... (length:359100) |
+----------------------------------------------------------------------------------------------------------------------------------------------+
Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
| label |
+----------------------------------------------------------------------------------------------------------------------------------------------+
| Value: bees | Confidence: 0.9162276387214661 |
+----------------------------------------------------------------------------------------------------------------------------------------------+
Key: ants2
Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
| image | 69, 121, 162, 44, 96, 137, 41, 90, 130, 68, 11 ... (length:1563912) |
+----------------------------------------------------------------------------------------------------------------------------------------------+
Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
| label |
+----------------------------------------------------------------------------------------------------------------------------------------------+
| Value: ants | Confidence: 0.9368477463722229 |
+----------------------------------------------------------------------------------------------------------------------------------------------+
pytorchscore¶
Official
No description
dffml_model_tensorflow_hub¶
pip install dffml-model-tensorflow-hub
textclf¶
Official
No description
dffml_model_spacy¶
pip install dffml-model-spacy
sner¶
Official
No description