MNIST Handwriten Digits

This example will show you how to train a model on the MNIST dataset and use the model for prediction via the DFFML CLI and HTTP API.

Download the files and verify them with sha384sum.

curl -sSLO "http://yann.lecun.com/exdb/mnist/{train-images-idx3,train-labels-idx1,t10k-images-idx3,t10k-labels-idx1}-ubyte.gz"
sha384sum -c - << EOF
1bf45877962fd391f7abb20534a30fd2203d0865309fec5f87d576dbdbefdcb16adb49220afc22a0f3478359d229449c  t10k-images-idx3-ubyte.gz
ccc1ee70f798a04e6bfeca56a4d0f0de8d8eeeca9f74641c1e1bfb00cf7cc4aa4d023f6ea1b40e79bb4707107845479d  t10k-labels-idx1-ubyte.gz
f40eb179f7c3d2637e789663bde56d444a23e4a0a14477a9e6ed88bc39c8ad6eaff68056c0cd9bb60daf0062b70dc8ee  train-images-idx3-ubyte.gz
ba9c11bf9a7f7c2c04127b8b3e568cf70dd3429d9029ca59b7650977a4ac32f8ff5041fe42bc872097487b06a6794e00  train-labels-idx1-ubyte.gz
EOF
t10k-images-idx3-ubyte.gz: OK
t10k-labels-idx1-ubyte.gz: OK
train-images-idx3-ubyte.gz: OK
train-labels-idx1-ubyte.gz: OK

The model we’ll be using is a part of dffml-model-tensorflow, which is a DFFML plugin which allows you to use TensorFlow via DFFML. We can install it with pip.

$ pip install -U dffml-model-tensorflow

Train the model.

dffml train \
    -model tfdnnc \
    -model-batchsize 1000 \
    -model-hidden 30 50 25 \
    -model-clstype int \
    -model-predict label:int:1 \
    -model-classifications $(seq 0 9) \
    -model-features image:int:$((28 * 28)) \
    -sources images=idx3 label=idx1 \
    -source-images-filename train-images-idx3-ubyte.gz \
    -source-images-feature image \
    -source-label-filename train-labels-idx1-ubyte.gz \
    -source-label-feature label \
    -log debug
... log output ...

Assess the model’s accuracy.

dffml accuracy \
    -model tfdnnc \
    -model-batchsize 1000 \
    -model-hidden 30 50 25 \
    -model-clstype int \
    -model-predict label:int:1 \
    -model-classifications $(seq 0 9) \
    -model-features image:int:$((28 * 28)) \
    -sources images=idx3 label=idx1 \
    -source-images-filename t10k-images-idx3-ubyte.gz \
    -source-images-feature image \
    -source-label-filename t10k-labels-idx1-ubyte.gz \
    -source-label-feature label \
    -log debug
... log output followed by accuracy as float ...
0.8269000053405762

The accuracy likely won’t be very good right now because we need to normalize the data first.

Create an image.csv file which contains the names of the images (with their extension .mnistpng) to predict on.

Note

Make sure to download each image and save them with the .mnistpng extension.

cat > image.csv << EOF
key,image
four,image1.mnistpng
five,image2.mnistpng
three,image3.mnistpng
two,image4.mnistpng
EOF

In this example, the image.csv file contains the names of the following images

../_images/image1.mnistpng ../_images/image2.mnistpng ../_images/image3.mnistpng ../_images/image4.mnistpng

Predict with the trained model.

dffml predict all \
    -model tfdnnc \
    -model-batchsize 1000 \
    -model-hidden 30 50 25 \
    -model-clstype int \
    -model-predict label:int:1 \
    -model-classifications $(seq 0 9) \
    -model-features image:int:$((28 * 28)) \
    -sources images=csv \
    -source-filename image.csv  \
    -source-loadfiles image  \
    -log critical

Output

[
    {
        "extra": {},
        "features": {
            "image": [
                0,
                .
                .
                0
            ]
        },
        "key": "four",
        "last_updated": "2020-03-18T04:07:01Z",
        "prediction": {
            "label": {
                "confidence": 0.4963473677635193,
                "value": 4
            }
        }
    },
    {
        "extra": {},
        "features": {
            "image": [
                0,
                .
                .
                0            
            ]
        },
        "key": "five",
        "last_updated": "2020-03-18T04:07:01Z",
        "prediction": {
            "label": {
                "confidence": 0.9070320725440979,
                "value": 5
            }
        }
    },
    {
        "extra": {},
        "features": {
            "image": [
                0,
                .
                .
                0            
            ]
        },
        "key": "three",
        "last_updated": "2020-03-18T04:07:01Z",
        "prediction": {
            "label": {
                "confidence": 0.9998736381530762,
                "value": 3
            }
        }
    },
    {
        "extra": {},
        "features": {
            "image": [
                0,
                .
                .
                0            
            ]
        },
        "key": "two",
        "last_updated": "2020-03-18T04:07:01Z",
        "prediction": {
            "label": {
                "confidence": 1.0,
                "value": 2
            }
        }
    }
]