Use a Model

For this tutorial we’ll be using Models that exist within DFFML.

We’re going to create a TensorFlow classification model, we’ll need to install the dffml-model-tensorflow plugin.

$ python -m pip install dffml-model-tensorflow

Iris Dataset

We’re going to train the model on the iris dataset. Let’s download the training and test files now.

The sha384sum commands are do ensure we downloaded the correct data.

$ wget http://download.tensorflow.org/data/iris_training.csv
--2020-10-16 15:19:54--  http://download.tensorflow.org/data/iris_training.csv
200 OK
Length: 2194 (2.1K) [text/csv]
Saving to: ‘iris_training.csv’

iris_training.csv                                 100%[==========================================================================================================>]   2.14K  --.-KB/s    in 0s

2020-10-16 15:19:54 (111 MB/s) - ‘iris_training.csv’ saved [2194/2194]
$ wget http://download.tensorflow.org/data/iris_test.csv
--2020-10-16 15:19:54--  http://download.tensorflow.org/data/iris_test.csv
200 OK
Length: 573 [text/csv]
Saving to: ‘iris_test.csv’

iris_test.csv                                     100%[==========================================================================================================>]    573   --.-KB/s    in 0s

2020-10-16 15:19:54 (71.4 MB/s) - ‘iris_test.csv’ saved [573/573]
$ echo '376c8ea3b7f85caff195b4abe62f34e8f4e7aece8bd087bbd746518a9d1fd60ae3b4274479f88ab0aa5c839460d535ef iris_training.csv' | sha384sum -c -
iris_training.csv: OK
$ echo '8c2cda42ce5ce6f977d17d668b1c98a45bfe320175f33e97293c62ab543b3439eab934d8e11b1208de1e4a9eb1957714 iris_test.csv' | sha384sum -c -
iris_test.csv: OK
$ sed -i 's/.*setosa,versicolor,virginica/SepalLength,SepalWidth,PetalLength,PetalWidth,classification/g' iris_training.csv iris_test.csv

Python Usage

We can create a Tensorflow classification model from Python code as follows.

This example makes use of dffml.noasync which contains versions of train, accuracy, and predict which we don’t have to be in an async function to call.

We use iris_training.csv to train the model, iris_test.csv to assess it’s accuracy, then we ask the models for predictions on two new records from neither dataset.

run.py

from dffml import CSVSource, Features, Feature
from dffml.noasync import train, score, predict
from dffml_model_tensorflow.dnnc import DNNClassifierModel
from dffml.accuracy import ClassificationAccuracy

model = DNNClassifierModel(
    features=Features(
        Feature("SepalLength", float, 1),
        Feature("SepalWidth", float, 1),
        Feature("PetalLength", float, 1),
        Feature("PetalWidth", float, 1),
    ),
    predict=Feature("classification", int, 1),
    epochs=3000,
    steps=20000,
    classifications=[0, 1, 2],
    clstype=int,
    location="tempdir",
)

# Train the model
train(model, "iris_training.csv")

# Assess accuracy (alternate way of specifying data source)
scorer = ClassificationAccuracy()
print(
    "Accuracy:",
    score(
        model,
        scorer,
        Feature("classification", int, 1),
        CSVSource(filename="iris_test.csv"),
    ),
)

# Make prediction
for i, features, prediction in predict(
    model,
    {
        "PetalLength": 4.2,
        "PetalWidth": 1.5,
        "SepalLength": 5.9,
        "SepalWidth": 3.0,
    },
    {
        "PetalLength": 5.4,
        "PetalWidth": 2.1,
        "SepalLength": 6.9,
        "SepalWidth": 3.1,
    },
):
    features["classification"] = prediction["classification"]["value"]
    print(features)

Run it to train the model

$ python3 run.py
Accuracy: 0.9666666388511658
{'PetalLength': 4.2, 'PetalWidth': 1.5, 'SepalLength': 5.9, 'SepalWidth': 3.0, 'classification': 1}
{'PetalLength': 5.4, 'PetalWidth': 2.1, 'SepalLength': 6.9, 'SepalWidth': 3.1, 'classification': 2}

Command Line Usage

Reference the TensorFlow Classifier on the Model plugins page for CLI usage examples.