Models

Models are implementations of dffml.model.model.Model, they abstract the usage of machine learning models.

If you want to get started creating your own model, check out the Models.

You can load any of the models seen here using the py:func:Model.load <dffml.model.model.Model.load> function. See the Load Models Dynamically tutorial for more deatils. .. _plugin_model_dffml:

dffml

pip install dffml

slr

Official

Logistic Regression training one variable to predict another.

The dataset used for training

dataset.csv

f1,ans
0.1,0
0.7,1
0.6,1
0.2,0
0.8,1

Train the model

$ dffml train \
    -model slr \
    -model-features f1:float:1 \
    -model-predict ans:int:1 \
    -model-location tempdir \
    -sources f=csv \
    -source-filename dataset.csv

Assess the accuracy

$ dffml accuracy \
    -model slr \
    -model-features f1:float:1 \
    -model-predict ans:int:1 \
    -model-location tempdir \
    -features ans:int:1 \
    -sources f=csv \
    -source-filename dataset.csv \
    -scorer mse \
1.0

Make a prediction

predict.csv

f1
0.8
$ dffml predict all \
    -model slr \
    -model-features f1:float:1 \
    -model-predict ans:int:1 \
    -model-location tempdir \
    -sources f=csv \
    -source-filename predict.csv
[
    {
        "extra": {},
        "features": {
            "f1": 0.8
        },
        "key": "0",
        "last_updated": "2020-11-15T16:22:25Z",
        "prediction": {
            "ans": {
                "confidence": 0.9355670103092784,
                "value": 1
            }
        }
    }
]

Example usage of Logistic Regression using Python

slr.py

from dffml import Features, Feature, SLRModel
from dffml.noasync import train, score, predict
from dffml.accuracy import MeanSquaredErrorAccuracy

model = SLRModel(
    features=Features(Feature("f1", float, 1)),
    predict=Feature("ans", int, 1),
    location="tempdir",
)

# Train the model
train(model, "dataset.csv")

# Assess accuracy (alternate way of specifying data source)
scorer = MeanSquaredErrorAccuracy()
print("Accuracy:", score(model, scorer, Feature("ans", int, 1), "dataset.csv"))

# Make prediction
for i, features, prediction in predict(model, {"f1": 0.8, "ans": 0}):
    features["ans"] = prediction["ans"]["value"]
    print(features)
$ python slr.py
Accuracy: 0.9355670103092784
{'f1': 0.8, 'ans': 1}

Args

  • predict: Feature

    • Label or the value to be predicted

  • features: List of features

    • Features to train on. For SLR only 1 allowed

  • location: Path

    • Location where state should be saved

dffml_model_scratch

pip install dffml-model-scratch

anomalydetection

Official

Model for Anomaly Detection using multivariate Gaussian distribution to predict probabilities of all records in the dataset and identify outliers. F1 score is used as the evaluation metric for this model. This model works well as it recognises dependencies across various features, and works particularly well if the features have a Gaussian Distribution.

Examples

Command line usage

Create training and test datasets

trainex.csv

A,Y
0.65,0
0.24,0
0.93,0
0.87,0
0.23,0
7,1
0.86,0
0.45,0
0.55,0
0.29,0
5,1
0.51,0
0.88,0
0.24,0
0.51,0
0.17,0
9,1
0.37,0
0.23,0
0.44,0
0.62,0
3,1
0.87,0

testex.csv

A,Y
0.45,0
0.23,0
0.67,0
8,1
0.19,0
0.34,0
0.49,0
0.31,0
0.47,0
4,1

Train the model

$ dffml train \
    -sources f=csv \
    -source-filename trainex.csv \
    -model anomalydetection \
    -model-features A:float:2 \
    -model-predict Y:int:1  \
    -model-location tempdir

Assess the accuracy

$ dffml accuracy \
    -sources f=csv \
    -source-filename testex.csv \
    -model anomalydetection \
    -model-features A:float:2 \
    -model-predict Y:int:1 \
    -model-location tempdir \
    -features Y:int:1 \
    -scorer anomalyscore

Make predictions

$ dffml predict all \
    -sources f=csv \
    -source-filename testex.csv \
    -model anomalydetection \
    -model-features A:float:2 \
    -model-predict Y:int:1 \
    -model-location tempdir

Python usage

from dffml import Feature, Features
from dffml.noasync import score, train

from dffml_model_scratch.anomalydetection import AnomalyModel
from dffml_model_scratch.anomaly_detection_scorer import (
    AnomalyDetectionAccuracy,
)

# Configure the model

model = AnomalyModel(
    features=Features(Feature("A", int, 2),),
    predict=Feature("Y", int, 1),
    location="model",
)


# Train the model
train(model, "trainex.csv")

# Assess accuracy for test set
scorer = AnomalyDetectionAccuracy()
print(
    "Test set F1 score :",
    score(model, scorer, Feature("Y", int, 1), "testex.csv"),
)

# Assess accuracy for training set
print(
    "Training set F1 score :",
    score(model, scorer, Feature("Y", int, 1), "trainex.csv"),
)

Output

$ python detectoutliers.py
Test set F1 score : 0.8
Training set F1 score : 0.888888888888889

Args

  • features: List of features

    • Features to train on

  • predict: Feature

    • Label or the value to be predicted

  • location: Path

    • Location where state should be saved

  • k: float

    • default: 0.8

    • Validation set size

scratchlgrsag

Official

Logistic Regression using stochastic average gradient descent optimizer

The dataset used for training

cat > dataset.csv << EOF
f1,ans
0.1,0
0.7,1
0.6,1
0.2,0
0.8,1
EOF

Train the model

dffml train \
  -model scratchlgrsag \
  -model-features f1:float:1 \
  -model-predict ans:int:1 \
  -model-location tempdir \
  -sources f=csv \
  -source-filename dataset.csv \
  -log debug

Assess the accuracy

dffml accuracy \
  -model scratchlgrsag \
  -model-features f1:float:1 \
  -model-predict ans:int:1 \
  -model-location tempdir \
  -features ans:int:1 \
  -sources f=csv \
  -source-filename dataset.csv \
  -scorer mse \
  -log debug

Output

1.0

Make a prediction

echo -e 'f1,ans\n0.8,0\n' | \
  dffml predict all \
  -model scratchlgrsag \
  -model-features f1:float:1 \
  -model-predict ans:int:1 \
  -model-location tempdir \
  -sources f=csv \
  -source-filename /dev/stdin \
  -log debug

Output

[
    {
        "extra": {},
        "features": {
            "ans": 0,
            "f1": 0.8
        },
        "last_updated": "2020-03-19T13:41:08Z",
        "prediction": {
            "ans": {
                "confidence": 1.0,
                "value": 1
            }
        },
        "key": "0"
    }
]

Example usage of Logistic Regression using Python

from dffml import CSVSource, Features, Feature
from dffml.noasync import train, score, predict
from dffml.accuracy import MeanSquaredErrorAccuracy
from dffml_model_scratch.logisticregression import LogisticRegression

model = LogisticRegression(
    features=Features(Feature("f1", float, 1)),
    predict=Feature("ans", int, 1),
    location="tempdir",
)

# Train the model
train(model, "dataset.csv")

# Assess accuracy (alternate way of specifying data source)
scorer = MeanSquaredErrorAccuracy()
print(
    "Accuracy:",
    score(
        model,
        scorer,
        Feature("ans", int, 1),
        CSVSource(filename="dataset.csv"),
    ),
)

# Make prediction
for i, features, prediction in predict(model, {"f1": 0.8, "ans": 0}):
    features["ans"] = prediction["ans"]["value"]
    print(features)

Args

  • predict: Feature

    • Label or the value to be predicted

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

dffml_model_xgboost

pip install dffml-model-xgboost

OSX Installation

XGBoost on OSX requires libomp

$ brew install libomp

xgbclassifier

Official

Model using xgboost to perform classification prediction via gradient boosted trees. XGBoost is a leading software library for working with standard tabular data (the type of data you store in Pandas DataFrames, as opposed to more exotic types of data like images and videos). With careful parameter tuning, you can train highly accurate models.

Examples

Command line usage

First download the training and test files, change the headers to DFFML format. The first row is an encoding of the classifications, we want CSV headers for the column names.

$ wget http://download.tensorflow.org/data/iris_training.csv
$ wget http://download.tensorflow.org/data/iris_test.csv
$ sed -i 's/.*setosa,versicolor,virginica/SepalLength,SepalWidth,PetalLength,PetalWidth,classification/g' iris_training.csv iris_test.csv

Run the train command

$ dffml train \
    -sources train=csv \
    -source-filename iris_training.csv \
    -model xgbclassifier \
    -model-features \
      SepalLength:float:1 \
      SepalWidth:float:1 \
      PetalLength:float:1 \
      PetalWidth:float:1 \
    -model-predict classification \
    -model-location model \
    -model-max_depth 3 \
    -model-learning_rate 0.01 \
    -model-learning_rate 0.01 \
    -model-n_estimators 200 \
    -model-reg_lambda 1 \
    -model-reg_alpha 0 \
    -model-gamma 0 \
    -model-colsample_bytree 0 \
    -model-subsample 1

Assess the accuracy

$ dffml accuracy \
    -sources train=csv \
    -source-filename iris_test.csv \
    -model xgbclassifier \
    -model-features \
      SepalLength:float:1 \
      SepalWidth:float:1 \
      PetalLength:float:1 \
      PetalWidth:float:1 \
    -model-predict classification \
    -model-location model \
    -features classification \
    -scorer clf

Make predictions

$ dffml predict all \
    -sources train=csv \
    -source-filename iris_test.csv \
    -model xgbclassifier \
    -model-features \
      SepalLength:float:1 \
      SepalWidth:float:1 \
      PetalLength:float:1 \
      PetalWidth:float:1 \
    -model-predict classification \
    -model-location model

Python usage

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

from dffml import Feature, Features
from dffml.noasync import train, score
from dffml.accuracy import ClassificationAccuracy
from dffml_model_xgboost.xgbclassifier import (
    XGBClassifierModel,
    XGBClassifierModelConfig,
)

iris = load_iris()
y = iris["target"]
X = iris["data"]
trainX, testX, trainy, testy = train_test_split(
    X, y, test_size=0.1, random_state=123
)

# Configure the model
model = XGBClassifierModel(
    XGBClassifierModelConfig(
        features=Features(Feature("data", float,)),
        predict=Feature("target", float, 1),
        location="model",
        max_depth=3,
        learning_rate=0.01,
        n_estimators=200,
        reg_lambda=1,
        reg_alpha=0,
        gamma=0,
        colsample_bytree=0,
        subsample=1,
    )
)

# Train the model
train(model, *[{"data": x, "target": y} for x, y in zip(trainX, trainy)])

# Assess accuracy
scorer = ClassificationAccuracy()
print(
    "Test accuracy:",
    score(
        model,
        scorer,
        Feature("target", float, 1),
        *[{"data": x, "target": y} for x, y in zip(testX, testy)],
    ),
)
print(
    "Training accuracy:",
    score(
        model,
        scorer,
        Feature("target", float, 1),
        *[{"data": x, "target": y} for x, y in zip(trainX, trainy)],
    ),
)

Output

Test accuracy: 0.933333333333333
Training accuracy: 0.9703703703703703

Args

  • location: Path

    • Location where model should be saved

  • features: List of features

    • Features on which we train the model

  • predict: Feature

    • Value to be predicted

  • learning_rate: float

    • default: 0.3

    • Learning rate to train with

  • n_estimators: Integer

    • default: 100

    • Number of gradient boosted trees. Equivalent to the number of boosting rounds

  • max_depth: Integer

    • default: 6

    • Maximium tree depth for base learners

  • objective: String

    • default: multi:softmax

    • Objective in training

  • subsample: float

    • default: 1

    • Subsample ratio of the training instance

  • gamma: float

    • default: 0

    • Minimium loss reduction required to make a furthre partition on a leaf node

  • n_jobs: Integer

    • default: -1

    • Number of parallel threads used to run xgboost

  • colsample_bytree: float

    • default: 1

    • Subsample ratio of columns when constructing each tree

  • booster: String

    • default: gbtree

    • Specify which booster to use: gbtree, gblinear or dart

  • min_child_weight: float

    • default: 1

    • Minimum sum of instance weight(hessian) needed in a child

  • reg_lambda: float

    • default: 1

    • L2 regularization term on weights. Increasing this value will make model more conservative

  • reg_alpha: float

    • default: 0

    • L1 regularization term on weights. Increasing this value will make model more conservative

xgbregressor

Official

Model using xgboost to perform regression prediction via gradient boosted trees XGBoost is a leading software library for working with standard tabular data (the type of data you store in Pandas DataFrames, as opposed to more exotic types of data like images and videos). With careful parameter tuning, you can train highly accurate models.

Examples

Command line usage

First download the training and test files, change the headers to DFFML format.

$ wget http://download.tensorflow.org/data/iris_training.csv
$ wget http://download.tensorflow.org/data/iris_test.csv
$ sed -i 's/.*setosa,versicolor,virginica/SepalLength,SepalWidth,PetalLength,PetalWidth,classification/g' iris_training.csv iris_test.csv

Run the train command

$ dffml train \
    -sources train=csv \
    -source-filename iris_training.csv \
    -model xgbregressor \
    -model-features \
      SepalLength:float:1 \
      SepalWidth:float:1 \
      PetalLength:float:1 \
      PetalWidth:float:1 \
    -model-predict classification \
    -model-location model \
    -model-max_depth 3 \
    -model-learning_rate 0.01 \
    -model-n_estimators 200 \
    -model-reg_lambda 1 \
    -model-reg_alpha 0 \
    -model-gamma 0 \
    -model-colsample_bytree 0 \
    -model-subsample 1

Assess the accuracy

$ dffml accuracy \
    -sources train=csv \
    -source-filename iris_test.csv \
    -model xgbregressor \
    -model-features \
      SepalLength:float:1 \
      SepalWidth:float:1 \
      PetalLength:float:1 \
      PetalWidth:float:1 \
    -model-predict classification \
    -model-location model \
    -features classification \
    -scorer mse

Output

accuracy: 0.8841466984766406

Make predictions

$ dffml predict all \
    -sources train=csv \
    -source-filename iris_test.csv \
    -model xgbregressor \
    -model-features \
      SepalLength:float:1 \
      SepalWidth:float:1 \
      PetalLength:float:1 \
      PetalWidth:float:1 \
    -model-predict classification \
    -model-location model

Python usage

run.py

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

from dffml import Feature, Features
from dffml.noasync import train, score
from dffml_model_xgboost.xgbregressor import (
    XGBRegressorModel,
    XGBRegressorModelConfig,
)
from dffml.accuracy import MeanSquaredErrorAccuracy


diabetes = load_diabetes()
y = diabetes["target"]
X = diabetes["data"]
trainX, testX, trainy, testy = train_test_split(
    X, y, test_size=0.1, random_state=123
)

# Configure the model
model = XGBRegressorModel(
    XGBRegressorModelConfig(
        features=Features(Feature("data", float, 10)),
        predict=Feature("target", float, 1),
        location="model",
        max_depth=3,
        learning_rate=0.05,
        n_estimators=400,
        reg_lambda=10,
        reg_alpha=0,
        gamma=10,
        colsample_bytree=0.3,
        subsample=0.8,
    )
)

# Train the model
train(model, *[{"data": x, "target": y} for x, y in zip(trainX, trainy)])

# Assess accuracy
scorer = MeanSquaredErrorAccuracy()
print(
    "Test accuracy:",
    score(
        model,
        scorer,
        Feature("target", float, 1),
        *[{"data": x, "target": y} for x, y in zip(testX, testy)],
    ),
)

print(
    "Training accuracy:",
    score(
        model,
        scorer,
        Feature("target", float, 1),
        *[{"data": x, "target": y} for x, y in zip(trainX, trainy)],
    ),
)

Output

$ python run.py
Test accuracy: 0.6669655406927468
Training accuracy: 0.819782501866115

Args

  • location: Path

    • Location where model should be saved

  • features: List of features

    • Features on which we train the model

  • predict: Feature

    • Value to be predicted

  • learning_rate: float

    • default: 0.05

    • Learning rate to train with

  • n_estimators: Integer

    • default: 1000

    • Number of gradient boosted trees. Equivalent to the number of boosting rounds

  • max_depth: Integer

    • default: 6

    • Maximium tree depth for base learners

  • subsample: float

    • default: 1

    • Subsample ratio of the training instance

  • gamma: float

    • default: 0

    • Minimium loss reduction required to make a furthre partition on a leaf node

  • n_jobs: Integer

    • default: -1

    • Number of parallel threads used to run xgboost

  • colsample_bytree: float

    • default: 1

    • Subsample ratio of columns when constructing each tree

  • booster: String

    • default: gbtree

    • Specify which booster to use: gbtree, gblinear or dart

  • min_child_weight: float

    • default: 0

    • Minimum sum of instance weight(hessian) needed in a child

  • reg_lambda: float

    • default: 1

    • L2 regularization term on weights. Increasing this value will make model more conservative

  • reg_alpha: float

    • default: 0

    • L1 regularization term on weights. Increasing this value will make model more conservative

dffml_model_vowpalWabbit

pip install dffml-model-vowpalWabbit

vwmodel

Official

Implemented using Vowpal Wabbit.

First we create the training and testing datasets

cat > train.csv << EOF
A,B
| price:.23 sqft:.25 age:.05 2006,-1
| price:.18 sqft:.15 age:.35 1976,1
| price:.53 sqft:.32 age:.87 1924,-1
EOF
cat > test.csv << EOF
A
| price:.46 sqft:.4 age:.10 1924
EOF

Train the model

dffml train \
  -model vwmodel \
  -model-features \
    A:str:1 \
  -model-predict \
    B:int:1 \
  -model-noconvert \
  -sources f=csv \
  -source-filename train.csv \
  -model-location tempdir

Assess the accuracy

dffml accuracy  \
  -model vwmodel \
  -model-features \
    A:str:1 \
  -model-predict \
    B:int:1 \
  -model-noconvert \
  -features B:int:1 \
  -scorer mse \
  -sources f=csv \
  -source-filename train.csv \
  -model-location tempdir

Output

0.38683876649129145

Make a prediction

dffml predict all \
  -model vwmodel \
  -model-features \
    A:str:1 \
  -model-predict \
    B:int:1 \
  -model-noconvert \
  -sources f=csv \
  -source-filename test.csv \
  -model-location tempdir

Output

[
    {
        "extra": {},
        "features": {
            "A": "| price:.46 sqft:.4 age:.10 1924"
        },
        "key": "0",
        "last_updated": "2020-05-29T16:36:57Z",
        "prediction": {
            "B": {
                "confidence": 0.38683876649129145,
                "value": 0.0
            }
        }
    }
]

Args

  • features: List of features

  • predict: Feature

    • Feature to predict

  • location: Path

    • Location where state should be saved

  • class_cost: List of features

    • default: None

    • Features with name Cost_{class} containing cost of class for each input example, used when csoaa is used

  • task: String

    • default: regression

    • Task to perform, possible values are classification, regression

  • use_binary_label: String

    • default: False

    • Convert target labels to -1 and 1 for binary classification

  • vwcmd: List of strings

    • default: []

    • Command Line Arguments as per vowpal wabbit convention

  • namespace: List of strings

    • default: []

    • Namespace for input features. Should be in format {namespace}_{feature name}

  • importance: Feature

    • default: None

    • Feature containing importance of each example, used in conversion of input data to vowpal wabbit input format

  • base: Feature

    • default: None

    • Feature containing base for each example, used for residual regression

  • tag: Feature

    • default: None

    • Feature to be used as tag in conversion of data to vowpal wabbit input format

  • noconvert: String

    • default: False

    • Do not convert record features to vowpal wabbit input format

dffml_model_scikit

pip install dffml-model-scikit

Machine Learning models implemented with scikit-learn. Models are saved under the directory in subdirectories named after the hash of their feature names.

General Usage:

Training:

$ dffml train \
    -model SCIKIT_MODEL_ENTRYPOINT \
    -model-features FEATURE_DEFINITION \
    -model-predict TO_PREDICT \
    -model-location MODEL_DIRECTORY \
    -model-SCIKIT_PARAMETER_NAME SCIKIT_PARAMETER_VALUE \
    -sources f=TRAINING_DATA_SOURCE_TYPE \
    -source-filename TRAINING_DATA_FILE_NAME \
    -log debug

Testing and Accuracy:

$ dffml accuracy \
    -model SCIKIT_MODEL_ENTRYPOINT \
    -model-features FEATURE_DEFINITION \
    -model-predict TO_PREDICT \
    -model-location MODEL_DIRECTORY \
    -features TO_PREDICT \
    -sources f=TESTING_DATA_SOURCE_TYPE \
    -source-filename TESTING_DATA_FILE_NAME \
    -scorer ACCURACY_SCORER \
    -log debug

Predicting with trained model:

$ dffml predict all \
    -model SCIKIT_MODEL_ENTRYPOINT \
    -model-features FEATURE_DEFINITION \
    -model-predict TO_PREDICT \
    -model-location MODEL_DIRECTORY \
    -sources f=PREDICT_DATA_SOURCE_TYPE \
    -source-filename PREDICT_DATA_FILE_NAME \
    -log debug

Models Available:

Type

Model

Entrypoint

Parameters

Multi-Output

Regression

LinearRegression

scikitlr

scikitlr

Yes

ElasticNet

scikiteln

scikiteln

Yes

RandomForestRegressor

scikitrfr

scikitrfr

Yes

BayesianRidge

scikitbyr

scikitbyr

Yes

Lasso

scikitlas

scikitlas

Yes

ARDRegression

scikitard

scikitard

Yes

RANSACRegressor

scikitrsc

scikitrsc

Yes

DecisionTreeRegressor

scikitdtr

scikitdtr

Yes

GaussianProcessRegressor

scikitgpr

scikitgpr

Yes

OrthogonalMatchingPursuit

scikitomp

scikitomp

Yes

Lars

scikitlars

scikitlars

Yes

Ridge

scikitridge

scikitridge

Yes

Classification

KNeighborsClassifier

scikitknn

scikitknn

Yes

AdaBoostClassifier

scikitadaboost

scikitadaboost

Yes

GaussianProcessClassifier

scikitgpc

scikitgpc

Yes

DecisionTreeClassifier

scikitdtc

scikitdtc

Yes

RandomForestClassifier

scikitrfc

scikitrfc

Yes

QuadraticDiscriminantAnalysis

scikitqda

scikitqda

Yes

MLPClassifier

scikitmlp

scikitmlp

Yes

GaussianNB

scikitgnb

scikitgnb

Yes

SVC

scikitsvc

scikitsvc

Yes

LogisticRegression

scikitlor

scikitlor

Yes

GradientBoostingClassifier

scikitgbc

scikitgbc

Yes

BernoulliNB

scikitbnb

scikitbnb

Yes

ExtraTreesClassifier

scikitetc

scikitetc

Yes

BaggingClassifier

scikitbgc

scikitbgc

Yes

LinearDiscriminantAnalysis

scikitlda

scikitlda

Yes

MultinomialNB

scikitmnb

scikitmnb

Yes

Clustering

KMeans

scikitkmeans

scikitkmeans

No

Birch

scikitbirch

scikitbirch

No

MiniBatchKMeans

scikitmbkmeans

scikitmbkmeans

No

AffinityPropagation

scikitap

scikitap

No

MeanShift

scikitms

scikitms

No

SpectralClustering

scikitsc

scikitsc

No

AgglomerativeClustering

scikitac

scikitac

No

OPTICS

scikitoptics

scikitoptics

No

Scorers Available:

Type

Scorer

Entrypoint

Parameters

Multi-Output

Regression

Explained Variance Score

exvscore

exvscore

Yes

Max Error

maxerr

maxerr

No

Mean Absolute Error

meanabserr

meanabserr

Yes

Mean Squared Error

meansqrerr

meansqrerr

Yes

Mean Squared Log Error

meansqrlogerr

meansqrlogerr

Yes

Median Absolute Error

medabserr

medabserr

Yes

R2 Score

r2score

r2score

Yes

Mean Poisson Deviance

meanpoidev

meanpoidev

No

Mean Gamma Deviance

meangammadev

meangammadev

No

Mean Absolute Percentage Error

meanabspererr

meanabspererr

Yes

Classification

Accuracy Score

acscore

acscore

Yes

Balanced Accuracy Score

bacscore

bacscore

Yes

Top K Accuracy Score

topkscore

topkscore

Yes

Average Precision Score

avgprescore

avgprescore

Yes

Brier Score Loss

brierscore

brierscore

Yes

F1 Score

f1score

f1score

Yes

Log Loss

logloss

logloss

Yes

Precision Score

prescore

prescore

Yes

Recall Score

recallscore

recallscore

Yes

Jaccard Score

jacscore

jacscore

Yes

Roc Auc Score

rocaucscore

rocaucscore

Yes

Clustering

Adjusted Mutual Info Score

adjmutinfoscore

adjmutinfoscore

No

Adjusted Rand Score

adjrandscore

adjrandscore

No

Completeness Score

complscore

complscore

No

Fowlkes Mallows Score

fowlmalscore

fowlmalscore

No

Homogeneity Score

homoscore

homoscore

No

Mutual Info Score

mutinfoscore

mutinfoscore

No

Normalized Mutual Info Score

normmutinfoscore

normmutinfoscore

No

Rand Score

randscore

randscore

No

V Measure Score

vmscore

vmscore

No

Supervised

Model’s Default Score

skmodelscore

skmodelscore

Yes

Usage Example:

Example below uses LinearRegression Model using the command line.

Let us take a simple example:

Years of Experience

Expertise

Trust Factor

Salary

0

01

0.2

10

1

03

0.4

20

2

05

0.6

30

3

07

0.8

40

4

09

1.0

50

5

11

1.2

60

First we create the files

cat > train.csv << EOF
Years,Expertise,Trust,Salary
0,1,0.1,10
1,3,0.2,20
2,5,0.3,30
3,7,0.4,40
EOF
cat > test.csv << EOF
Years,Expertise,Trust,Salary
4,9,0.5,50
5,11,0.6,60
EOF

Train the model

dffml train \
  -model scikitlr \
  -model-features Years:int:1 Expertise:int:1 Trust:float:1 \
  -model-predict Salary:float:1 \
  -model-location tempdir \
  -sources f=csv \
  -source-filename train.csv

Assess accuracy

dffml accuracy \
  -model scikitlr \
  -model-features Years:int:1 Expertise:int:1 Trust:float:1 \
  -model-predict Salary:float:1 \
  -model-location tempdir \
  -features Salary:float:1 \
  -scorer mse \
  -sources f=csv \
  -source-filename test.csv

Output:

1.0

Make a prediction

echo -e 'Years,Expertise,Trust\n6,13,0.7\n' | \
dffml predict all \
  -model scikitlr \
  -model-features Years:int:1 Expertise:int:1 Trust:float:1 \
  -model-predict Salary:float:1 \
  -model-location tempdir \
  -sources f=csv \
  -source-filename /dev/stdin

Output:

[
    {
        "extra": {},
        "features": {
            "Expertise": 13,
            "Trust": 0.7,
            "Years": 6
        },
        "key": "0",
        "last_updated": "2020-03-01T22:26:46Z",
        "prediction": {
            "Salary": {
                "confidence": 1.0,
                "value": 70.0
            }
        }
    }
]

Example usage of Linear Regression Model using python API:

from dffml import CSVSource, Features, Feature
from dffml.noasync import train, score, predict
from dffml_model_scikit import LinearRegressionModel
from dffml.accuracy import MeanSquaredErrorAccuracy

model = LinearRegressionModel(
    features=Features(
        Feature("Years", int, 1),
        Feature("Expertise", int, 1),
        Feature("Trust", float, 1),
    ),
    predict=Feature("Salary", int, 1),
    location="tempdir",
)

# Train the model
train(model, "train.csv")

# Assess accuracy (alternate way of specifying data source)
scorer = MeanSquaredErrorAccuracy()
print(
    "Accuracy:",
    score(
        model,
        scorer,
        Feature("Salary", int, 1),
        CSVSource(filename="test.csv"),
    ),
)

# Make prediction
for i, features, prediction in predict(
    model,
    {"Years": 6, "Expertise": 13, "Trust": 0.7},
    {"Years": 7, "Expertise": 15, "Trust": 0.8},
):
    features["Salary"] = prediction["Salary"]["value"]
    print(features)

Example below uses KMeans Clustering Model on a small randomly generated dataset.

 $ cat > train.csv << EOF
Col1,          Col2,        Col3,         Col4
5.05776417,   8.55128116,   6.15193196,  -8.67349666
3.48864265,  -7.25952218,  -4.89216256,   4.69308946
-8.16207603,  5.16792984,  -2.66971993,   0.2401882
6.09809669,   8.36434181,   6.70940915,  -7.91491768
-9.39122566,  5.39133807,  -2.29760281,  -1.69672981
0.48311336,   8.19998973,   7.78641979,   7.8843821
2.22409135,  -7.73598586,  -4.02660224,   2.82101794
2.8137247 ,   8.36064298,   7.66196849,   3.12704676
EOF
 $ cat > test.csv << EOF
Col1,             Col2,          Col3,         Col4,    cluster
-10.16770144,   2.73057215,  -1.49351481,   2.43005691,    6
3.59705381,  -4.76520663,  -3.34916068,   5.72391486,     1
4.01612313,  -4.641852  ,  -4.77333308,   5.87551683,     0
EOF
 $ dffml train \
     -model scikitkmeans \
     -model-features Col1:float:1 Col2:float:1 Col3:float:1 Col4:float:1 \
     -model-location tempdir \
     -sources f=csv \
     -source-filename train.csv \
     -source-readonly \
     -log debug
 $ dffml accuracy \
     -model scikitkmeans \
     -model-features Col1:float:1 Col2:float:1 Col3:float:1 Col4:float:1\
     -model-predict cluster:int:1 \
     -model-location tempdir \
     -features cluster:int:1 \
     -sources f=csv \
     -source-filename test.csv \
     -source-readonly \
     -scorer skmodelscore \
     -log debug
 0.6365141682948129
 $ echo -e 'Col1,Col2,Col3,Col4\n6.09809669,8.36434181,6.70940915,-7.91491768\n' | \
   dffml predict all \
     -model scikitkmeans \
     -model-features Col1:float:1 Col2:float:1 Col3:float:1 Col4:float:1 \
     -model-location tempdir \
     -sources f=csv \
     -source-filename /dev/stdin \
     -source-readonly \
     -log debug
 [
     {
         "extra": {},
         "features": {
             "Col1": 6.09809669,
             "Col2": 8.36434181,
             "Col3": 6.70940915,
             "Col4": -7.91491768
         },
         "last_updated": "2020-01-12T22:51:15Z",
         "prediction": {
             "confidence": 0.6365141682948129,
             "value": 2
         },
         "key": "0"
     }
 ]

Example usage of KMeans Clustering Model using python API:

from dffml import CSVSource, Features, Feature
from dffml.noasync import train, score, predict
from dffml_model_scikit import KMeansModel
from dffml_model_scikit import MutualInfoScoreScorer

model = KMeansModel(
    features=Features(
        Feature("Col1", float, 1),
        Feature("Col2", float, 1),
        Feature("Col3", float, 1),
        Feature("Col4", float, 1),
    ),
    predict=Feature("cluster", int, 1),
    location="tempdir",
)

# Train the model
train(model, "train.csv")

# Assess accuracy (alternate way of specifying data source)
scorer = MutualInfoScoreScorer()
print("Accuracy:", score(model, scorer, Feature("cluster", int, 1), CSVSource(filename="test.csv")))

# Make prediction
for i, features, prediction in predict(
    model,
    {"Col1": 6.09809669, "Col2": 8.36434181, "Col3": 6.70940915, "Col4": -7.91491768},
):
    features["cluster"] = prediction["cluster"]["value"]
    print(features)

NOTE: Transductive Clusterers(scikitsc, scikitac, scikitoptics) cannot handle unseen data. Ensure that predict and accuracy for these algorithms uses training data.

Args

  • predict: Feature

    • Label or the value to be predicted

    • Only used by classification and regression models

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

dffml_model_daal4py

pip install dffml-model-daal4py

daal4pylr

Official

Implemented using daal4py.

First we create the training and testing datasets

train.csv

f1,ans
12.4,11.2
14.3,12.5
14.5,12.7
14.9,13.1
16.1,14.1
16.9,14.8
16.5,14.4
15.4,13.4
17.0,14.9
17.9,15.6
18.8,16.4
20.3,17.7
22.4,19.6
19.4,16.9
15.5,14.0
16.7,14.6

test.csv

f1,ans
18.8,16.4
20.3,17.7
22.4,19.6
19.4,16.9
15.5,14.0
16.7,14.6

Train the model

$ dffml train \
    -model daal4pylr \
    -model-features f1:float:1 \
    -model-predict ans:int:1 \
    -model-location tempdir \
    -sources f=csv \
    -source-filename train.csv

Assess the accuracy

$ dffml accuracy \
    -model daal4pylr \
    -model-features f1:float:1 \
    -model-predict ans:int:1 \
    -model-location tempdir \
    -features ans:int:1 \
    -sources f=csv \
    -source-filename test.csv \
    -scorer mse \
0.6666666666666666

Make a prediction

$ echo -e 'f1,ans\n0.8,1\n' | \
  dffml predict all \
    -model daal4pylr \
    -model-features f1:float:1 \
    -model-predict ans:int:1 \
    -model-location tempdir \
    -sources f=csv \
    -source-filename /dev/stdin
[
    {
        "extra": {},
        "features": {
            "ans": 1,
            "f1": 0.8
        },
        "key": "0",
        "last_updated": "2020-07-22T02:53:11Z",
        "prediction": {
            "ans": {
                "confidence": null,
                "value": 1.1907472649730522
            }
        }
    }
]

Example usage of daal4py Linear Regression model using python API

run.py

from dffml import CSVSource, Features, Feature
from dffml.noasync import train, score, predict
from dffml_model_daal4py.daal4pylr import DAAL4PyLRModel
from dffml.accuracy import MeanSquaredErrorAccuracy

model = DAAL4PyLRModel(
    features=Features(Feature("f1", float, 1)),
    predict=Feature("ans", int, 1),
    location="tempdir",
)

# Train the model
train(model, "train.csv")

# Assess accuracy (alternate way of specifying data source)
scorer = MeanSquaredErrorAccuracy()
print(
    "Accuracy:",
    score(
        model, scorer, Feature("ans", int, 1), CSVSource(filename="test.csv")
    ),
)

# Make prediction
for i, features, prediction in predict(model, {"f1": 0.8, "ans": 0}):
    features["ans"] = prediction["ans"]["value"]
    print(features)

Run the file

$ python run.py

Args

  • predict: Feature

    • Label or the value to be predicted

  • features: List of features

    • Features to train on. For SLR only 1 allowed

  • location: Path

    • Location where state should be saved

dffml_model_pytorch

pip install dffml-model-pytorch

Machine Learning models implemented with PyTorch. Models are saved under the directory in model.pt.

General Usage:

Training:

$ dffml train \
    -model PYTORCH_MODEL_ENTRYPOINT \
    -model-features FEATURE_DEFINITION \
    -model-predict TO_PREDICT \
    -model-location MODEL_LOCATION \
    -model-CONFIGS CONFIG_VALUES \
    -sources f=TRAINING_DATA_SOURCE_TYPE \
    -source-CONFIGS TRAINING_DATA \
    -log debug

Testing and Accuracy:

$ dffml accuracy \
    -model PYTORCH_MODEL_ENTRYPOINT \
    -model-features FEATURE_DEFINITION \
    -model-predict TO_PREDICT \
    -model-location MODEL_LOCATION \
    -model-CONFIGS CONFIG_VALUES \
    -features TO_PREDICT \
    -sources f=TESTING_DATA_SOURCE_TYPE \
    -source-CONFIGS TESTING_DATA \
    -log debug

Predicting with trained model:

$ dffml predict all \
    -model PYTORCH_MODEL_ENTRYPOINT \
    -model-features FEATURE_DEFINITION \
    -model-predict TO_PREDICT \
    -model-location MODEL_LOCATION \
    -model-CONFIGS CONFIG_VALUES \
    -sources f=PREDICT_DATA_SOURCE_TYPE \
    -source-CONFIGS PREDICTION_DATA \
    -log debug

Pre-Trained Models Available:

Type

Model

Entrypoint

Architecture

Classification

AlexNet

alexnet

AlexNet architecture

DenseNet-121

densenet121

DenseNet architecture

DenseNet-161

densenet161

DenseNet-169

densenet169

DenseNet-201

densenet201

MnasNet 0.5

mnasnet0_5

MnasNet architecture

MnasNet 1.0

mnasnet1_0

MobileNet V2

mobilenet_v2

MobileNet V2 architecture

VGG-11

vgg11

VGG-11 architecture Configuration “A”

VGG-11 with batch normalization

vgg11_bn

VGG-13

vgg13

VGG-13 architecture Configuration “B”

VGG-13 with batch normalization

vgg13_bn

VGG-16

vgg16

VGG-16 architecture Configuration “D”

VGG-16 with batch normalization

vgg16_bn

VGG-19

vgg19

VGG-19 architecture Configuration “E”

VGG-19 with batch normalization

vgg19_bn

GoogleNet

googlenet

GoogleNet architecture

Inception V3

inception_v3

Inception V3 architecture

ResNet-18

resnet18

ResNet architecture

ResNet-34

resnet34

ResNet-50

resnet50

ResNet-101

resnet101

ResNet-152

resnet152

Wide ResNet-101-2

wide_resnet101_2

Wide Resnet architecture

Wide ResNet-50-2

wide_resnet50_2

ShuffleNet V2 0.5

shufflenet_v2_x0_5

Shuffle Net V2 architecture

ShuffleNet V2 1.0

shufflenet_v2_x1_0

ResNext-101-32x8D

resnext101_32x8d

ResNext architecture

ResNext-50-32x4D

resnext50_32x4d

Usage Example:

Example below uses ResNet-18 Model using the command line.

Let us take a simple example: Classifying Ants and Bees Images

First, we download the dataset and verify with sha384sum

curl -LO https://download.pytorch.org/tutorial/hymenoptera_data.zip
sha384sum -c - << EOF
491db45cfcab02d99843fbdcf0574ecf99aa4f056d52c660a39248b5524f9e6e8f896d9faabd27ffcfc2eaca0cec6f39  /home/tron/Desktop/Development/hymenoptera_data.zip
EOF
hymenoptera_data.zip: OK

Unzip the file

unzip hymenoptera_data.zip

We first create a YAML file to define the last layer(s) to replace from the network architecture

layers.yaml

linear1:
  layer_type: Linear
  in_features: 512
  out_features: 256
relu:
  layer_type: ReLU
dropout:
  layer_type: Dropout
  p: 0.2
linear2:
  layer_type: Linear
  in_features: 256
  out_features: 2
logsoftmax:
  layer_type: LogSoftmax
  dim: 1

Train the model

dffml train \
  -model resnet18 \
  -model-add_layers \
  -model-layers @layers.yaml \
  -model-clstype str \
  -model-classifications ants bees \
  -model-location resnet18_model \
  -model-imageSize 224 \
  -model-epochs 5 \
  -model-batch_size 32 \
  -model-enableGPU \
  -model-features image:int:$((500*500)) \
  -model-predict label:str:1 \
  -sources f=dir \
    -source-foldername hymenoptera_data/train \
    -source-feature image \
    -source-labels ants bees \
  -log critical

Assess accuracy

dffml accuracy \
  -model resnet18 \
  -model-add_layers \
  -model-layers @layers.yaml \
  -model-clstype str \
  -model-classifications ants bees \
  -model-location resnet18_model \
  -model-imageSize 224 \
  -model-batch_size 32 \
  -model-enableGPU \
  -model-features image:int:$((500*500)) \
  -model-predict label:str:1 \
  -features label:str:1 \
  -sources f=dir \
    -source-foldername hymenoptera_data/val \
    -source-feature image \
    -source-labels ants bees \
  -scorer pytorchscore \
  -log critical

Output:

0.9215686274509803

Create a csv file with the names of the images to predict, whether they are ants or bees.

cat > unknown_images.csv << EOF
key,image
ants1,hymenoptera_data/val/ants/Ant-1818.jpg
bee1,hymenoptera_data/val//bees/10870992_eebeeb3a12.jpg
bee2,hymenoptera_data/val/bees/abeja.jpg
ants2,hymenoptera_data/val/ants/desert_ant.jpg
EOF

Make the predictions

dffml predict all \
  -model resnet18 \
  -model-add_layers \
  -model-layers @layers.yaml \
  -model-clstype str \
  -model-classifications ants bees \
  -model-location resnet18_model \
  -model-imageSize 224 \
  -model-enableGPU \
  -model-features image:int:$((500*500)) \
  -model-predict label:str:1 \
  -sources f=csv \
    -source-filename unknown_images.csv \
    -source-loadfiles image \
  -log critical \
  -pretty

Output:


	Key:	ants1
                                                               Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
|               image               |                    59, 66, 83, 60, 70, 87, 57, 72, 88, 53, 74, 89 ... (length:263250)                    |
+----------------------------------------------------------------------------------------------------------------------------------------------+

                                                                  Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                    label                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|            Value:  ants           |                                     Confidence:   0.9920881390571594                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+

	Key:	bee1
                                                               Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
|               image               |                    63, 114, 146, 63, 114, 146, 63, 114, 146, 63,  ... (length:696000)                    |
+----------------------------------------------------------------------------------------------------------------------------------------------+

                                                                  Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                    label                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|            Value:  bees           |                                     Confidence:   0.6108130216598511                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+

	Key:	bee2
                                                               Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
|               image               |                    103, 253, 254, 98, 254, 254, 91, 255, 254, 89, ... (length:359100)                    |
+----------------------------------------------------------------------------------------------------------------------------------------------+

                                                                  Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                    label                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|            Value:  bees           |                                     Confidence:   0.9162276387214661                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+

	Key:	ants2
                                                               Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
|               image               |                   69, 121, 162, 44, 96, 137, 41, 90, 130, 68, 11 ... (length:1563912)                    |
+----------------------------------------------------------------------------------------------------------------------------------------------+

                                                                  Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                    label                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|            Value:  ants           |                                     Confidence:   0.9368477463722229                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+

alexnet

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

densenet121

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

densenet161

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

densenet169

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

densenet201

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

googlenet

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

inception_v3

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

mnasnet0_5

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

mnasnet1_0

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

mobilenet_v2

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

pytorchnet

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • network: typing.Union[dffml_model_pytorch.pytorch_net.Network, torch.nn.modules.module.Module]

    • default: None

    • Model

resnet101

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

resnet152

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

resnet18

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

resnet34

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

resnet50

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

resnext101_32x8d

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

resnext50_32x4d

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

shufflenet_v2_x0_5

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

shufflenet_v2_x1_0

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

vgg11

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

vgg11_bn

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

vgg13

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

vgg13_bn

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

vgg16

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

vgg16_bn

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

vgg19

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

vgg19_bn

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

wide_resnet101_2

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

wide_resnet50_2

Official

No description

Args

  • predict: Feature

    • Feature name holding classification value

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • imageSize: Integer

    • default: None

    • Common size for all images to resize and crop to

  • enableGPU: String

    • default: False

    • Utilize GPUs for processing

  • epochs: Integer

    • default: 20

    • Number of iterations to pass over all records in a source

  • batch_size: Integer

    • default: 32

    • Batch size

  • validation_split: float

    • default: 0.0

    • Split training data for Validation

  • patience: Integer

    • default: 5

    • Early stops the training if validation loss doesn’t improve after a given patience

  • loss: PyTorchLoss

    • default: <class ‘dffml.base.CrossEntropyLossFunction’>

    • Loss Functions available in PyTorch

  • optimizer: String

    • default: SGD

    • Optimizer Algorithms available in PyTorch

  • normalize_mean: List of floats

    • default: None

    • Mean values for normalizing Tensor image

  • normalize_std: List of floats

    • default: None

    • Standard Deviation values for normalizing Tensor image

  • pretrained: String

    • default: True

    • Load Pre-trained model weights

  • trainable: String

    • default: False

    • Tweak pretrained model by training again

  • add_layers: String

    • default: False

    • Replace the last layer of the pretrained model

  • layers: typing.Union[dict, torch.nn.modules.container.ModuleDict, torch.nn.modules.container.Sequential, torch.nn.modules.container.ModuleList, torch.nn.modules.module.Module]

    • default: None

    • Extra layers to replace the last layer of the pretrained model

dffml_model_tensorflow

pip install dffml-model-tensorflow

Note

It’s important to keep the hidden layer config and feature config the same across invocations of train, predict, and accuracy methods.

Models are saved under the directory parameter in subdirectories named after the hash of their feature names and hidden layer config. Which means if any of those parameters change between invocations, it’s being told to look for a different saved model.

tfdnnc

Official

Implemented using Tensorflow’s DNNClassifier.

First we create the training and testing datasets

wget http://download.tensorflow.org/data/iris_training.csv
echo '376c8ea3b7f85caff195b4abe62f34e8f4e7aece8bd087bbd746518a9d1fd60ae3b4274479f88ab0aa5c839460d535ef iris_training.csv' | sha384sum -c -
sed -i 's/.*setosa,versicolor,virginica/SepalLength,SepalWidth,PetalLength,PetalWidth,classification/g' *.csv
wget http://download.tensorflow.org/data/iris_test.csv
echo '8c2cda42ce5ce6f977d17d668b1c98a45bfe320175f33e97293c62ab543b3439eab934d8e11b1208de1e4a9eb1957714 iris_test.csv' | sha384sum -c -
sed -i 's/.*setosa,versicolor,virginica/SepalLength,SepalWidth,PetalLength,PetalWidth,classification/g' *.csv

Train the model

dffml train \
  -model tfdnnc \
  -model-epochs 3000 \
  -model-steps 20000 \
  -model-predict classification:int:1 \
  -model-location tempdir \
  -model-classifications 0 1 2 \
  -model-clstype int \
  -sources iris=csv \
  -source-filename iris_training.csv \
  -model-features \
    SepalLength:float:1 \
    SepalWidth:float:1 \
    PetalLength:float:1 \
    PetalWidth:float:1 \
  -log debug

Assess the accuracy

dffml accuracy \
  -model tfdnnc \
  -model-predict classification:int:1 \
  -model-location tempdir \
  -model-classifications 0 1 2 \
  -model-clstype int \
  -features classification:int:1 \
  -scorer clf \
  -sources iris=csv \
  -source-filename iris_test.csv \
  -model-features \
    SepalLength:float:1 \
    SepalWidth:float:1 \
    PetalLength:float:1 \
    PetalWidth:float:1 \
  -log critical

Output

0.99996233782

Make a prediction

echo -e 'SepalLength,SepalWidth,PetalLength,PetalWidth\n5.9,3.0,4.2,1.5\n' | \
dffml predict all \
  -model tfdnnc \
  -model-predict classification:int:1 \
  -model-location tempdir \
  -model-classifications 0 1 2 \
  -model-clstype int \
  -sources iris=csv \
  -model-features \
    SepalLength:float:1 \
    SepalWidth:float:1 \
    PetalLength:float:1 \
    PetalWidth:float:1 \
  -source-filename /dev/stdin

Output

[
    {
        "extra": {},
        "features": {
            "PetalLength": 4.2,
            "PetalWidth": 1.5,
            "SepalLength": 5.9,
            "SepalWidth": 3.0,
            "classification": 1
        },
        "last_updated": "2019-07-31T02:00:12Z",
        "prediction": {
            "classification":
                {
                    "confidence": 0.9999997615814209,
                    "value": 1
                }
        },
        "key": "0"
    },
]

Example usage of Tensorflow DNNClassifier model using python API

from dffml import CSVSource, Features, Feature
from dffml.noasync import train, score, predict
from dffml_model_tensorflow.dnnc import DNNClassifierModel
from dffml.accuracy import ClassificationAccuracy

model = DNNClassifierModel(
    features=Features(
        Feature("SepalLength", float, 1),
        Feature("SepalWidth", float, 1),
        Feature("PetalLength", float, 1),
        Feature("PetalWidth", float, 1),
    ),
    predict=Feature("classification", int, 1),
    epochs=3000,
    steps=20000,
    classifications=[0, 1, 2],
    clstype=int,
    location="tempdir",
)

# Train the model
train(model, "iris_training.csv")

# Assess accuracy (alternate way of specifying data source)
scorer = ClassificationAccuracy()
print(
    "Accuracy:",
    score(
        model,
        scorer,
        Feature("classification", int, 1),
        CSVSource(filename="iris_test.csv"),
    ),
)

# Make prediction
for i, features, prediction in predict(
    model,
    {
        "PetalLength": 4.2,
        "PetalWidth": 1.5,
        "SepalLength": 5.9,
        "SepalWidth": 3.0,
    },
    {
        "PetalLength": 5.4,
        "PetalWidth": 2.1,
        "SepalLength": 6.9,
        "SepalWidth": 3.1,
    },
):
    features["classification"] = prediction["classification"]["value"]
    print(features)

Args

  • predict: Feature

    • Feature name holding target values

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • steps: Integer

    • default: 3000

    • Number of steps to train the model

  • epochs: Integer

    • default: 30

    • Number of iterations to pass over all records in a source

  • hidden: List of integers

    • default: [12, 40, 15]

    • List length is the number of hidden layers in the network. Each entry in the list is the number of nodes in that hidden layer

  • classifications: List of strings

    • default: None

    • Options for value of classification

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • batchsize: Integer

    • default: 20

    • Number records to pass through in an epoch

  • shuffle: String

    • default: True

    • Randomise order of records in a batch

tfdnnr

Official

Implemented using Tensorflow’s DNNEstimator.

Usage:

  • predict: Name of the feature we are trying to predict or using for training.

Generating train and test data

  • This creates files train.csv and test.csv, make sure to take a BACKUP of files with same name in the directory from where this command is run as it overwrites any existing files.

cat > train.csv << EOF
Feature1,Feature2,TARGET
0.93,0.68,3.89
0.24,0.42,1.75
0.36,0.68,2.75
0.53,0.31,2.00
0.29,0.25,1.32
0.29,0.52,2.14
EOF
cat > test.csv << EOF
Feature1,Feature2,TARGET
0.57,0.84,3.65
0.95,0.19,2.46
0.23,0.15,0.93
EOF

Train the model

dffml train \
  -model tfdnnr \
  -model-epochs 300 \
  -model-steps 2000 \
  -model-predict TARGET:float:1 \
  -model-location tempdir \
  -model-hidden 8 16 8 \
  -sources s=csv \
  -source-filename train.csv \
  -model-features \
    Feature1:float:1 \
    Feature2:float:1 \
  -log debug

Assess the accuracy

dffml accuracy \
  -model tfdnnr \
  -model-predict TARGET:float:1 \
  -model-location tempdir \
  -model-hidden 8 16 8 \
  -features TARGET:float:1 \
  -sources s=csv \
  -source-filename test.csv \
  -model-features \
    Feature1:float:1 \
    Feature2:float:1 \
  -scorer mse \
  -log critical

Output

0.9468210011

Make a prediction

echo -e 'Feature1,Feature2,TARGET\n0.21,0.18,0.84\n' | \
  dffml predict all \
  -model tfdnnr \
  -model-predict TARGET:float:1 \
  -model-location tempdir \
  -model-hidden 8 16 8 \
  -sources s=csv \
  -source-filename /dev/stdin \
  -model-features \
    Feature1:float:1 \
    Feature2:float:1 \
  -log critical

Output

[
    {
        "extra": {},
        "features": {
            "Feature1": 0.21,
            "Feature2": 0.18,
            "TARGET": 0.84
        },
        "last_updated": "2019-10-24T15:26:41Z",
        "prediction": {
            "TARGET" : {
                "confidence": null,
                "value": 1.1983429193496704
            }
        },
        "key": 0
    }
]

Example usage of Tensorflow DNNEstimator model using python API

from dffml import CSVSource, Features, Feature
from dffml.noasync import train, score, predict
from dffml_model_tensorflow.dnnr import DNNRegressionModel
from dffml.accuracy import MeanSquaredErrorAccuracy

model = DNNRegressionModel(
    features=Features(
        Feature("Feature1", float, 1), Feature("Feature2", float, 1)
    ),
    predict=Feature("TARGET", float, 1),
    epochs=300,
    steps=2000,
    hidden=[8, 16, 8],
    location="tempdir",
)

# Train the model
train(model, "train.csv")

# Assess accuracy (alternate way of specifying data source)
scorer = MeanSquaredErrorAccuracy()
print(
    "Accuracy:",
    score(
        model,
        scorer,
        Feature("TARGET", float, 1),
        CSVSource(filename="test.csv"),
    ),
)

# Make prediction
for i, features, prediction in predict(
    model, {"Feature1": 0.21, "Feature2": 0.18, "TARGET": 0.84}
):
    features["TARGET"] = prediction["TARGET"]["value"]
    print(features)

The null in confidence is the expected behavior. (See TODO in predict).

Args

  • predict: Feature

    • Feature name holding target values

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • steps: Integer

    • default: 3000

    • Number of steps to train the model

  • epochs: Integer

    • default: 30

    • Number of iterations to pass over all records in a source

  • hidden: List of integers

    • default: [12, 40, 15]

    • List length is the number of hidden layers in the network. Each entry in the list is the number of nodes in that hidden layer

dffml_model_tensorflow_hub

pip install dffml-model-tensorflow-hub

text_classifier

Official

Implemented using Tensorflow hub pretrained models.

cat > train.csv << EOF
sentence,sentiment
Life is good,1
This book is amazing,1
It's a terrible movie,2
Global warming is bad,0
I hate you!!,2
This movie is horrible,2
EOF
cat > test.csv << EOF
sentence,sentiment
I am not feeling good,0
Our trip was full of adventures,1
EOF

Train the model

dffml train \
  -model text_classifier \
  -model-epochs 30 \
  -model-predict sentiment:int:1 \
  -model-location tempdir \
  -model-classifications 0 1 \
  -model-clstype int \
  -sources f=csv \
  -source-filename train.csv \
  -model-features \
    sentence:str:1 \
  -model-model_path "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim-with-oov/1" \
  -model-add_layers \
  -model-layers "Dense(units=512, activation='relu')" "Dense(units=2, activation='softmax')" \
  -log debug

Assess the accuracy

dffml accuracy \
  -model text_classifier \
  -model-predict sentiment:int:1 \
  -model-location tempdir \
  -model-classifications 0 1 \
  -model-model_path "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim-with-oov/1" \
  -model-clstype int \
  -features sentiment:int:1 \
  -sources f=csv \
  -source-filename test.csv \
  -model-features \
    sentence:str:1 \
  -scorer textclf \
  -log critical

Output

0.5

Make a prediction

dffml predict all \
  -model text_classifier \
  -model-predict sentiment:int:1 \
  -model-location tempdir \
  -model-classifications 0 1 \
  -model-model_path "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim-with-oov/1" \
  -model-clstype int \
  -sources f=csv \
  -source-filename test.csv \
  -model-features \
    sentence:str:1 \
  -log debug

Output

[
    {
        "extra": {},
        "features": {
            "sentence": "I am not feeling good",
            "sentiment": 0
        },
        "key": "0",
        "last_updated": "2020-05-14T20:14:30Z",
        "prediction": {
            "sentiment": {
                "confidence": 0.9999992847442627,
                "value": 1
            }
        }
    },
    {
        "extra": {},
        "features": {
            "sentence": "Our trip was full of adventures",
            "sentiment": 1
        },
        "key": "1",
        "last_updated": "2020-05-14T20:14:30Z",
        "prediction": {
            "sentiment": {
                "confidence": 0.9999088048934937,
                "value": 1
            }
        }
    }
]

Example usage of Tensorflow_hub Text Classifier model using python API

from dffml import CSVSource, Features, Feature
from dffml.noasync import train, score, predict
from dffml_model_tensorflow_hub.text_classifier import TextClassificationModel
from dffml_model_tensorflow_hub.text_classifier_accuracy import (
    TextClassifierAccuracy,
)

model = TextClassificationModel(
    features=Features(Feature("sentence", str, 1)),
    predict=Feature("sentiment", int, 1),
    classifications=[0, 1, 2],
    clstype=int,
    location="tempdir",
)

# Train the model
train(model, "train.csv")

# Assess accuracy (alternate way of specifying data source)
scorer = TextClassifierAccuracy()
print(
    "Accuracy:",
    score(
        model,
        scorer,
        Feature("sentiment", int, 1),
        CSVSource(filename="test.csv"),
    ),
)

# Make prediction
for i, features, prediction in predict(
    model, {"sentence": "This track is horrible"},
):
    features["sentiment"] = prediction["sentiment"]["value"]
    print(features)

Args

  • predict: Feature

    • Feature name holding classification value

  • classifications: List of strings

    • Options for value of classification

  • features: List of features

    • Features to train on

  • location: Path

    • Location where state should be saved

  • trainable: String

    • default: True

    • Tweak pretrained model by training again

  • batch_size: Integer

    • default: 120

    • Batch size

  • max_seq_length: Integer

    • default: 256

    • Length of sentence, used in preprocessing of input for bert embedding

  • add_layers: String

    • default: False

    • Add layers on the top of pretrianed model/layer

  • embedType: String

    • default: None

    • Type of pretrained embedding model, required to be set to bert to use bert pretrained embedding

  • layers: List of strings

    • default: None

    • Extra layers to be added on top of pretrained model

  • model_path: String

  • optimizer: String

    • default: adam

    • Optimizer used by model

  • metrics: String

    • default: accuracy

    • Metric used to evaluate model

  • clstype: Type

    • default: <class ‘str’>

    • Data type of classifications values

  • epochs: Integer

    • default: 10

    • Number of iterations to pass over all records in a source

dffml_model_spacy

pip install dffml-model-spacy

spacyner

Official

Implemented using Spacy statistical models .

Note

You must download en_core_web_sm before using this model

$ python -m spacy download en_core_web_sm

First we create the training and testing datasets.

Training data:

train.json

{
    "data": [
        {
            "sentence": "I went to London and Berlin.",
            "entities": [
                {
                    "start":10,
                    "end": 16,
                    "tag": "LOC"
                },
                {
                    "start":21,
                    "end": 27,
                    "tag": "LOC"
                }
            ]
        },
        {
            "sentence": "Who is Alex?",
            "entities": [
                {
                    "start":7,
                    "end": 11,
                    "tag": "PERSON"
                }
            ]
        }
    ]
}

Testing data:

test.json

{
    "data": [
        {
            "sentence": "Alex went to London?"
        }
    ]
}

Train the model

$ dffml train \
    -model spacyner \
    -sources s=op \
    -source-opimp dffml_model_spacy.ner.utils:parser \
    -source-args train.json False \
    -model-model_name en_core_web_sm \
    -model-location temp \
    -model-n_iter 5 \
    -log debug

Assess the accuracy

$ dffml accuracy \
    -model spacyner \
    -sources s=op \
    -source-opimp dffml_model_spacy.ner.utils:parser \
    -source-args train.json False \
    -model-model_name en_core_web_sm \
    -model-location temp \
    -model-n_iter 5 \
    -features tag:str:1 \
    -scorer sner \
    -log debug
0.0

Make a prediction

$ dffml predict all \
    -model spacyner \
    -sources s=op \
    -source-opimp dffml_model_spacy.ner.utils:parser \
    -source-args test.json True \
    -model-model_name en_core_web_sm \
    -model-location temp \
    -model-n_iter 5 \
    -log debug
[
    {
        "extra": {},
        "features": {
            "entities": [],
            "sentence": "Alex went to London?"
        },
        "key": 0,
        "last_updated": "2020-07-27T16:26:18Z",
        "prediction": {
            "Answer": {
                "confidence": null,
                "value": [
                    [
                        "Alex",
                        "PERSON"
                    ],
                    [
                        "London",
                        "GPE"
                    ]
                ]
            }
        }
    }
]

The model can be trained on large datasets to get the expected output. The example shown above is to demonstrate the commandline usage of the model.

In the above train, accuracy and predict commands, op source is used to read and parse data from json file before feeding it to the model. The function used by opsource to parse json data is:

import ast
import json


def parser(json_file: str, is_predicting: bool) -> dict:
    with open(json_file) as f:
        parsed_data = {}
        data = json.load(f)["data"]
        for id, entry in enumerate(data):
            entities = []
            sentence = entry["sentence"]
            if not ast.literal_eval(is_predicting):
                for entity in entry["entities"]:
                    start = entity["start"]
                    end = entity["end"]
                    tag = entity["tag"]
                    entities.append((start, end, tag))
            parsed_data[id] = {
                "features": {"sentence": sentence, "entities": entities,}
            }
        return parsed_data

The location of the function is passed using:

-source-opimp dffml_model_spacy.ner.utils:parser

And the arguments to parser are passed by:

-source-args train.json False

where train.json is the name of file containing training data and the bool False is value of the flag is_predicting.

Args

  • location: String

    • Output location.

  • model_name: String

    • default: None

    • Name of one of the trained pipelines provided by spaCy. You can find complete list at: https://spacy.io/models Defaults to blank ‘en’ model.

  • n_iter: Integer

    • default: 10

    • Number of training iterations

  • dropout: float

    • default: 0.5

    • Dropout rate to be used during training

dffml_model_autosklearn

pip install dffml-model-autosklearn

Follow these instructions before running the above install command to ensure that auto-sklearn installs correctly

Ubuntu Installation

To provide a C++11 building environment and the lateste SWIG version on Ubuntu, run:

$ sudo apt-get install build-essential swig

Install other PyPi dependencies with

$ python3 -m pip install cython liac-arff psutil
$ curl https://raw.githubusercontent.com/automl/auto-sklearn/master/requirements.txt | xargs -n 1 -L 1 python3 -m pip install

For more information about installation visit https://automl.github.io/auto-sklearn/master/installation.html#installation

autoclassifier

Official

No description

Args

  • features: List of features

    • Features to train on

  • predict: Feature

    • Label or the value to be predicted

  • location: Path

    • Location where state should be saved

  • time_left_for_this_task: Integer

    • default: 3600

    • Time limit in seconds for the search of appropriate models. By increasing this value, auto-sklearn has a higher chance of finding better models.

  • per_run_time_limit: Integer

    • default: None

    • Time limit for a single call to the machine learning model. Model fitting will be terminated if the machine learning algorithm runs over the time limit. Set this value high enough so that typical machine learning algorithms can be fit on the training data.

  • initial_configurations_via_metalearning: Integer

    • default: 25

    • Initialize the hyperparameter optimization algorithm with this many configurations which worked well on previously seen datasets. Disable if the hyperparameter optimization algorithm should start from scratch.

  • ensemble_size: Integer

    • default: 50

    • Number of models added to the ensemble built by Ensemble selection from libraries of models. Models are drawn with replacement.

  • ensemble_nbest: Integer

    • default: 50

    • Only consider the ensemble_nbest models when building an ensemble.

  • max_models_on_disc: Integer

    • default: 50

    • Defines the maximum number of models that are kept in the disc. The additional number of models are permanently deleted. Due to the nature of this variable, it sets the upper limit on how many models can be used for an ensemble. It must be an integer greater or equal than 1. If set to None, all models are kept on the disc.

  • seed: Integer

    • default: 1

    • Used to seed SMAC. Will determine the output file names.

  • memory_limit: Integer

    • default: 3072

    • Memory limit in MB for the machine learning algorithm. auto-sklearn will stop fitting the machine learning algorithm if it tries to allocate more than memory_limit MB. If None is provided, no memory limit is set. In case of multi-processing, memory_limit will be per job. This memory limit also applies to the ensemble creation process.

  • include_estimators: typing.Any

    • default: None

    • If None, all possible estimators are used. Otherwise specifies set of estimators to use.

  • exclude_estimators: typing.Any

    • default: None

    • If None, all possible estimators are used. Otherwise specifies set of estimators not to use. Incompatible with include_estimators.

  • include_preprocessors: typing.Any

    • default: None

    • If None all possible preprocessors are used. Otherwise specifies set of preprocessors to use.

  • exclude_preprocessors: typing.Any

    • default: None

    • If None all possible preprocessors are used. Otherwise specifies set of preprocessors not to use. Incompatible with include_preprocessors.

  • resampling_strategy: String

    • default: holdout

    • how to to handle overfitting, might need ‘resampling_strategy_arguments’ fit where possible ‘folds’ in scikit-learn model_selection module in scikit-learn model_selection module in scikit-learn model_selection module

  • resampling_strategy_arguments: dict

    • default: None

      • train_size should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. * shuffle determines whether the data is shuffled prior to splitting it into train and validation. required by chosen class as specified in scikit-learn documentation. If arguments are not provided, scikit-learn defaults are used. If no defaults are available, an exception is raised. Refer to the ‘n_splits’ argument as ‘folds’.

  • tmp_folder: String

    • default: None

    • folder to store configuration output and log files, if None automatically use /tmp/autosklearn_tmp_$pid_$random_number

  • output_folder: String

    • default: None

    • folder to store predictions for optional test set, if None no output will be generated

  • delete_tmp_folder_after_terminate: String

    • default: True

    • remove tmp_folder, when finished. If tmp_folder is None tmp_dir will always be deleted

  • delete_output_folder_after_terminate: String

    • default: True

    • remove output_folder, when finished. If output_folder is None output_dir will always be deleted

  • n_jobs: Integer

    • default: None

    • The number of jobs to run in parallel for fit(). -1 means using all processors. By default, Auto-sklearn uses a single core for fitting the machine learning model and a single core for fitting an ensemble. Ensemble building is not affected by n_jobs but can be controlled by the number of models in the ensemble. In contrast to most scikit-learn models, n_jobs given in the constructor is not applied to the predict() method. If dask_client is None, a new dask client is created.

  • dask_client: typing.Any

    • default: None

    • User-created dask client, can be used to start a dask cluster and then attach auto-sklearn to it.

  • disable_evaluator_output: String

    • default: False

    • If True, disable model and prediction output. Cannot be used together with ensemble building. predict() cannot be used when setting this True. Can also be used as a list to pass more fine-grained information on what to save. Allowed elements in the optimization/validation set, which would later on be used to build an ensemble.

  • smac_scenario_args: dict

    • default: None

    • Additional arguments inserted into the scenario of SMAC. See the for a list of available arguments.

  • get_smac_object_callback: typing.Any

    • default: None

    • Callback function to create an object of class The function must accept the arguments scenario_dict, instances, num_params, runhistory, seed and ta. This is an advanced feature. Use only if you are familiar with

  • logging_config: dict

    • default: None

    • dictionary object specifying the logger configuration. If None, the default logging.yaml file is used, which can be found in the directory util/logging.yaml relative to the installation.

  • metadata_directory: String

    • default: None

    • path to the metadata directory. If None, the default directory (autosklearn.metalearning.files) is used.

  • metric: typing.Any

    • default: None

    • Metrics`_. If None is provided, a default metric is selected depending on the task.

  • scoring_functions: typing.Any

    • default: None

    • List of scorers which will be calculated for each pipeline and results will be available via cv_results

  • load_models: String

    • default: True

    • Whether to load the models after fitting Auto-sklearn.

autoregressor

Official

autoregressor / AutoSklearnRegressorModel will use auto-sklearn to train the a scikit model for you.

This is AutoML, it will tune hyperparameters for you.

Implemented using AutoSklearnRegressor.

First we create the training and testing datasets

train.csv

Feature1,Feature2,TARGET
0.93,0.68,3.89
0.24,0.42,1.75
0.36,0.68,2.75
0.53,0.31,2.00
0.29,0.25,1.32
0.29,0.52,2.14

test.csv

Feature1,Feature2,TARGET
0.57,0.84,3.65
0.95,0.19,2.46
0.23,0.15,0.93

Train the model

$ dffml train \
    -model autoregressor \
    -model-predict TARGET:float:1 \
    -model-clstype int \
    -sources f=csv \
    -source-filename train.csv \
    -model-features \
      Feature1:float:1 \
      Feature2:float:1 \
    -model-time_left_for_this_task 120 \
    -model-per_run_time_limit 30 \
    -model-ensemble_size 50 \
    -model-delete_tmp_folder_after_terminate False \
    -model-location tempdir \
    -log debug

Assess the accuracy

$ dffml accuracy \
    -model autoregressor \
    -model-predict TARGET:float:1 \
    -model-location tempdir \
    -features TARGET:float:1 \
    -sources f=csv \
    -source-filename test.csv \
    -model-features \
      Feature1:float:1 \
      Feature2:float:1 \
    -scorer mse \
    -log critical
0.9961211434899032

Make a file containing the data to predict on

predict.csv

Feature1,Feature2
0.57,0.84

Make a prediction

$ dffml predict all \
    -model autoregressor \
    -model-location tempdir \
    -model-predict TARGET:float:1 \
    -sources iris=csv \
    -model-features \
      Feature1:float:1 \
      Feature2:float:1 \
    -source-filename predict.csv
[
    {
        "extra": {},
        "features": {
            "Feature1": 0.57,
            "Feature2": 0.84
        },
        "key": "0",
        "last_updated": "2020-11-23T05:52:13Z",
        "prediction": {
            "TARGET": {
                "confidence": NaN,
                "value": 3.566799074411392
            }
        }
    }
]

The model can be trained on large datasets to get better accuracy output. The example shown above is to demonstrate the command line usage of the model.

Example usage of using the model from Python

run.py

from dffml import Features, Feature
from dffml.noasync import train, score, predict
from dffml_model_autosklearn import AutoSklearnRegressorModel
from dffml.accuracy import MeanSquaredErrorAccuracy

model = AutoSklearnRegressorModel(
    features=Features(
        Feature("Feature1", float, 1), Feature("Feature2", float, 1),
    ),
    predict=Feature("TARGET", float, 1),
    location="tempdir-python",
    time_left_for_this_task=120,
)


def main():
    # Train the model
    train(model, "train.csv")

    # Assess accuracy
    scorer = MeanSquaredErrorAccuracy()
    print(
        "Accuracy:",
        score(model, scorer, Feature("TARGET", float, 1), "test.csv"),
    )

    # Make prediction
    for i, features, prediction in predict(model, "predict.csv"):
        features["TARGET"] = prediction["TARGET"]["value"]
        print(features)


if __name__ == "__main__":
    main()

Run the file

$ python run.py
Accuracy: 0.9961211434899032
{'Feature1': 0.57, 'Feature2': 0.84, 'TARGET': 3.6180416345596313}

Args

  • features: List of features

    • Features to train on

  • predict: Feature

    • Label or the value to be predicted

  • location: Path

    • Location where state should be saved

  • time_left_for_this_task: Integer

    • default: 3600

    • Time limit in seconds for the search of appropriate models. By increasing this value, auto-sklearn has a higher chance of finding better models.

  • per_run_time_limit: Integer

    • default: None

    • Time limit for a single call to the machine learning model. Model fitting will be terminated if the machine learning algorithm runs over the time limit. Set this value high enough so that typical machine learning algorithms can be fit on the training data.

  • initial_configurations_via_metalearning: Integer

    • default: 25

    • Initialize the hyperparameter optimization algorithm with this many configurations which worked well on previously seen datasets. Disable if the hyperparameter optimization algorithm should start from scratch.

  • ensemble_size: Integer

    • default: 50

    • Number of models added to the ensemble built by Ensemble selection from libraries of models. Models are drawn with replacement.

  • ensemble_nbest: Integer

    • default: 50

    • Only consider the ensemble_nbest models when building an ensemble.

  • max_models_on_disc: Integer

    • default: 50

    • Defines the maximum number of models that are kept in the disc. The additional number of models are permanently deleted. Due to the nature of this variable, it sets the upper limit on how many models can be used for an ensemble. It must be an integer greater or equal than 1. If set to None, all models are kept on the disc.

  • seed: Integer

    • default: 1

    • Used to seed SMAC. Will determine the output file names.

  • memory_limit: Integer

    • default: 3072

    • Memory limit in MB for the machine learning algorithm. auto-sklearn will stop fitting the machine learning algorithm if it tries to allocate more than memory_limit MB. If None is provided, no memory limit is set. In case of multi-processing, memory_limit will be per job. This memory limit also applies to the ensemble creation process.

  • include_estimators: typing.Any

    • default: None

    • If None, all possible estimators are used. Otherwise specifies set of estimators to use.

  • exclude_estimators: typing.Any

    • default: None

    • If None, all possible estimators are used. Otherwise specifies set of estimators not to use. Incompatible with include_estimators.

  • include_preprocessors: typing.Any

    • default: None

    • If None all possible preprocessors are used. Otherwise specifies set of preprocessors to use.

  • exclude_preprocessors: typing.Any

    • default: None

    • If None all possible preprocessors are used. Otherwise specifies set of preprocessors not to use. Incompatible with include_preprocessors.

  • resampling_strategy: String

    • default: holdout

    • how to to handle overfitting, might need ‘resampling_strategy_arguments’ fit where possible ‘folds’ in scikit-learn model_selection module in scikit-learn model_selection module in scikit-learn model_selection module

  • resampling_strategy_arguments: dict

    • default: None

      • train_size should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. * shuffle determines whether the data is shuffled prior to splitting it into train and validation. required by chosen class as specified in scikit-learn documentation. If arguments are not provided, scikit-learn defaults are used. If no defaults are available, an exception is raised. Refer to the ‘n_splits’ argument as ‘folds’.

  • tmp_folder: String

    • default: None

    • folder to store configuration output and log files, if None automatically use /tmp/autosklearn_tmp_$pid_$random_number

  • output_folder: String

    • default: None

    • folder to store predictions for optional test set, if None no output will be generated

  • delete_tmp_folder_after_terminate: String

    • default: True

    • remove tmp_folder, when finished. If tmp_folder is None tmp_dir will always be deleted

  • delete_output_folder_after_terminate: String

    • default: True

    • remove output_folder, when finished. If output_folder is None output_dir will always be deleted

  • n_jobs: Integer

    • default: None

    • The number of jobs to run in parallel for fit(). -1 means using all processors. By default, Auto-sklearn uses a single core for fitting the machine learning model and a single core for fitting an ensemble. Ensemble building is not affected by n_jobs but can be controlled by the number of models in the ensemble. In contrast to most scikit-learn models, n_jobs given in the constructor is not applied to the predict() method. If dask_client is None, a new dask client is created.

  • dask_client: typing.Any

    • default: None

    • User-created dask client, can be used to start a dask cluster and then attach auto-sklearn to it.

  • disable_evaluator_output: String

    • default: False

    • If True, disable model and prediction output. Cannot be used together with ensemble building. predict() cannot be used when setting this True. Can also be used as a list to pass more fine-grained information on what to save. Allowed elements in the optimization/validation set, which would later on be used to build an ensemble.

  • smac_scenario_args: dict

    • default: None

    • Additional arguments inserted into the scenario of SMAC. See the for a list of available arguments.

  • get_smac_object_callback: typing.Any

    • default: None

    • Callback function to create an object of class The function must accept the arguments scenario_dict, instances, num_params, runhistory, seed and ta. This is an advanced feature. Use only if you are familiar with

  • logging_config: dict

    • default: None

    • dictionary object specifying the logger configuration. If None, the default logging.yaml file is used, which can be found in the directory util/logging.yaml relative to the installation.

  • metadata_directory: String

    • default: None

    • path to the metadata directory. If None, the default directory (autosklearn.metalearning.files) is used.

  • metric: typing.Any

    • default: None

    • Metrics`_. If None is provided, a default metric is selected depending on the task.

  • scoring_functions: typing.Any

    • default: None

    • List of scorers which will be calculated for each pipeline and results will be available via cv_results

  • load_models: String

    • default: True

    • Whether to load the models after fitting Auto-sklearn.