Neural Networks¶

Rock Paper Scissors Hand Pose Classification¶

This tutorial will show you how to train and test a PyTorch based custom neural network model made using DFFML. The dataset we will be using is the rock-paper-scissors-dataset which contains images of hands in Rock/Paper/Scissors poses, each image is a 300x300 RGB image.

The model we’ll be using is PyTorchNeuralNetwork which is a part of dffml-model-pytorch, a DFFML plugin which allows you to use PyTorch via DFFML. We can install it with pip. We will also be using image loading from dffml-config-image and YAML file loading from dffml-config-yaml for creating our neural network.

$ pip install -U dffml-model-pytorch dffml-config-image dffml-config-yaml

(.venv) C:\Users\username> python -m pip install -U dffml-model-pytorch dffml-config-image dffml-config-yaml -f https://download.pytorch.org/whl/torch_stable.html

Download the dataset and verify with with sha384sum.

curl -LO https://storage.googleapis.com/laurencemoroney-blog.appspot.com/\{rps,rps-test-set,rps-validation\}.zip
sha384sum -c - << EOF
c6a9119b0c6a0907b782bd99e04ce09a0924c0895df6a26bc6fb06baca4526f55e51f7156cceb4791cc65632d66085e8  rps.zip
fc45a0ebe58b9aafc3cd5a60020fa042d3a19c26b0f820aee630b9602c8f53dd52fd40f35d44432dd031dea8f30a5f66  rps-test-set.zip
375457bb95771ffeace2beedab877292d232f31e76502618d25e0d92a3e029d386429f52c771b05ae1c7229d2f5ecc29  rps-validation.zip
EOF

rps.zip: OK
rps-test-set.zip: OK
rps-validation.zip: OK

Extract the datasets.

$ unzip rps.zip
$ unzip rps-test-set.zip
$ unzip rps-validation.zip -d rps-predict

The dataset for training the model will be in the rps directory. The dataset for testing the model will be in the rps-test-set directory. The images we will be using for prediction on the neural network will be in the rps-predict directory.

Now that we have our dataset ready, we can perform classification of the hand poses to predict whether it is rock, paper or scissors!

We first create the neural network.

The neural network can be created in 2 ways using DFFML:

By creating a dictionary of layers in YAML or JSON format passing the file via CLI (eg. @model.yaml).
By using the torch module to create the model and passing an instance of the network to the model config.

Command Line¶

We first create a YAML file to define the neural network with all the information about the layers along with the forward method which is passed as list of layers under the model name key:

model.yaml

model:
  conv1:
    layer_type: Conv2d
    in_channels: 3
    out_channels: 32
    kernel_size: 5
    padding: 2
  conv2:
    layer_type: Conv2d
    in_channels: 32
    out_channels: 32
    kernel_size: 3
    padding: 1
  conv3:
    layer_type: Conv2d
    in_channels: 32
    out_channels: 16
    kernel_size: 3
    padding: 1
  relu:
    layer_type: ReLU
  pooling:
    layer_type: MaxPool2d
    kernel_size: 2
  linear:
    layer_type: Linear
    in_features: 1296
    out_features: 3
forward:
  model:
    # block 1
    # Image dimensions at the beginning: torch.Size([batch_size, 32, 150, 150])
    - conv1
    - relu
    - pooling
    # block 2
    # Image dimensions after block 1: torch.Size([batch_size, 32, 75, 75])
    - conv2
    - relu
    - pooling
    # block 3
    # Image dimensions after block 2: torch.Size([batch_size, 32, 37, 37])
    - conv2
    - relu
    - pooling
    # block 4
    # Image dimensions after block 3: torch.Size([batch_size, 32, 18, 18])
    - conv3
    - relu
    - pooling
    # fully connected layer
    # Image dimensions after block 4: torch.Size([batch_size, 16, 9, 9])
    # As the `Linear` layer only accepts 1D Tensors (2D if we take into account the batch_size),
    # We need to change the shape of the incoming Tensor i.e. flatten it in this case, so we use
    # torch's `view` property which will change the incoming Tensor size to torch.Size([batch_size, 16*9*9]),
    # hence the '-1' for the value of batch_size to be inferred from the remaining dimensions which is the 
    # batch_size specified already in this case. It is good practice not to hard code the batch_size in the network
    # as we might want to change it in the future. This is what the Linear layer is fed as "in_features: 1296".
    - view:
      - -1
      - 1296
    - linear

To learn more about Tensor Views, visit Tensor Views PyTorch Docs

See also

Sequential layers can also be created by indenting the layers under a key! The layers defined inside the Sequential layer can be used again while defining the forward method in the following syntax: - block1.conv1 More info about PyTorch’s Sequential Layers and other layers used can be found at the Official PyTorch Documentation - torch.nn module

An example of creating Sequential Layers would be:

example_model:
    block1:
        ...
    # One of the many Sequential layers in example_model
    block2:
        conv2:
            name: Conv2d
            in_channels: 32
            out_channels: 16
            kernel_size: 3
            padding: 1
        relu:
            name: ReLU
        maxpooling:
            name: MaxPool2d
            kernel_size: 2
    block3:
        ...
    linear:
        ...
forward:
    model:
        - block1
        - block2
        - block3
        - block1.conv1 # Re-using a single layer inside another `Sequential Layer`
        - block2.maxpooling
        - view:
            - -1
            - 1296
        - linear

Note

If the forward method is not specified in the YAML file, it is automatically created by appending the top level layers (Sequential or Single) sequentially in the order they were defined in the file.

Train the model.

dffml train \
  -model pytorchnet \
  -model-features image:int:$((300*300*3)) \
  -model-clstype str \
  -model-classifications rock paper scissors \
  -model-predict label:int:1 \
  -model-network @model.yaml \
  -model-location rps_model \
  -model-loss crossentropyloss \
  -model-optimizer Adam \
  -model-validation_split 0.2 \
  -model-epochs 10 \
  -model-batch_size 32 \
  -model-imageSize 150 \
  -model-enableGPU \
  -model-patience 2 \
  -sources f=dir \
    -source-foldername rps \
    -source-feature image \
    -source-labels rock paper scissors \
  -log debug

INFO:dffml.PyTorchNeuralNetworkContext:Training complete in 1m 42s
INFO:dffml.PyTorchNeuralNetworkContext:Best Validation Accuracy: 1.000000

Assess the model’s accuracy.

dffml accuracy \
  -model pytorchnet \
  -model-features image:int:$((300*300*3)) \
  -model-clstype str \
  -model-classifications rock paper scissors \
  -model-predict label:int:1 \
  -model-network @model.yaml \
  -model-location rps_model \
  -model-imageSize 150 \
  -model-enableGPU \
  -features label:int:1 \
  -sources f=dir \
    -source-foldername rps-test-set \
    -source-feature image \
    -source-labels rock paper scissors \
  -scorer pytorchscore

The output is:

0.8763440860215054

Predict with the trained model.

dffml predict all \
  -model pytorchnet \
  -model-features image:int:$((300*300*3)) \
  -model-clstype str \
  -model-classifications rock paper scissors \
  -model-predict label:int:1 \
  -model-network @model.yaml \
  -model-location rps_model \
  -model-imageSize 150 \
  -model-enableGPU \
  -sources f=dir \
    -source-foldername rps-predict \
    -source-feature image \
  -pretty

Some of the Predictions:

	Key:	scissors7.png
                                                               Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
|               image               |                     [[253, 253, 253], [254, 254, 254], [254, 254,  ... (length:300)                      |
+----------------------------------------------------------------------------------------------------------------------------------------------+

                                                                  Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                    label                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|          Value:  scissors         |                                     Confidence:   0.9904084205627441                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+

	Key:	rock8.png
                                                               Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
|               image               |                     [[254, 254, 254], [253, 253, 253], [253, 253,  ... (length:300)                      |
+----------------------------------------------------------------------------------------------------------------------------------------------+

                                                                  Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                    label                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|            Value:  rock           |                                            Confidence:   1.0                                             |
+----------------------------------------------------------------------------------------------------------------------------------------------+

	Key:	paper4.png
                                                               Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
|               image               |                     [[254, 254, 254], [254, 254, 254], [253, 253,  ... (length:300)                      |
+----------------------------------------------------------------------------------------------------------------------------------------------+

                                                                  Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                    label                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|           Value:  paper           |                                     Confidence:   0.9904376864433289                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+

	Key:	paper-hires1.png
                                                               Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
|               image               |                     [[253, 253, 253], [253, 253, 253], [252, 252,  ... (length:900)                      |
+----------------------------------------------------------------------------------------------------------------------------------------------+

                                                                  Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                    label                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|           Value:  paper           |                                     Confidence:   0.8567885756492615                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+

	Key:	scissors-hires2.png
                                                               Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
|               image               |                     [[253, 253, 253], [253, 253, 253], [253, 253,  ... (length:900)                      |
+----------------------------------------------------------------------------------------------------------------------------------------------+

                                                                  Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                    label                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|          Value:  scissors         |                                     Confidence:   0.9906457662582397                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+

	Key:	rock-hires1.png
                                                               Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
|               image               |                     [[254, 254, 254], [253, 253, 253], [251, 251,  ... (length:900)                      |
+----------------------------------------------------------------------------------------------------------------------------------------------+

                                                                  Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                    label                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|            Value:  rock           |                                            Confidence:   1.0                                             |
+----------------------------------------------------------------------------------------------------------------------------------------------+

Python API¶

import torch.nn as nn
import asyncio

from dffml import train, score, predict, DirectorySource, Features, Feature
from dffml_model_pytorch import PyTorchNeuralNetwork, CrossEntropyLossFunction
from dffml_model_pytorch.pytorch_accuracy_scorer import PytorchAccuracy


# Define the Neural Network
class ConvNet(nn.Module):
    """
    Convolutional Neural Network to classify hand gestures in an image as rock, paper or scissors
    """

    def __init__(self):
        super(ConvNet, self).__init__()

        self.conv1 = nn.Conv2d(
            in_channels=3, out_channels=32, kernel_size=5, padding=2
        )
        self.conv2 = nn.Conv2d(
            in_channels=32, out_channels=32, kernel_size=3, padding=1
        )
        self.conv3 = nn.Conv2d(
            in_channels=32, out_channels=16, kernel_size=3, padding=1
        )

        self.relu = nn.ReLU()
        self.pooling = nn.MaxPool2d(kernel_size=2)

        self.linear = nn.Linear(in_features=16 * 9 * 9, out_features=3)

    def forward(self, x):
        # block 1
        x = self.pooling(self.relu(self.conv1(x)))

        # block 2
        x = self.pooling(self.relu(self.conv2(x)))

        # block 3
        x = self.pooling(self.relu(self.conv2(x)))

        # block 4
        x = self.pooling(self.relu(self.conv3(x)))

        # fully connected layer
        x = self.linear(x.view(-1, 16 * 9 * 9))
        return x


RockPaperScissorsModel = ConvNet()
Loss = CrossEntropyLossFunction()

# Define the dffml model config
model = PyTorchNeuralNetwork(
    classifications=["rock", "paper", "scissors"],
    features=Features(Feature("image", int, 300 * 300)),
    predict=Feature("label", int, 1),
    location="rps_model",
    network=RockPaperScissorsModel,
    epochs=10,
    batch_size=32,
    imageSize=150,
    validation_split=0.2,
    loss=Loss,
    optimizer="Adam",
    enableGPU=True,
    patience=2,
)

# Define source for training image dataset
train_source = DirectorySource(
    foldername="rps", feature="image", labels=["rock", "paper", "scissors"],
)

# Define source for testing image dataset
test_source = DirectorySource(
    foldername="rps-test-set",
    feature="image",
    labels=["rock", "paper", "scissors"],
)

# Define source for prediction image dataset
predict_source = DirectorySource(foldername="rps-predict", feature="image",)


async def main():
    # Train the model
    await train(model, train_source)

    # Assess the accuracy
    scorer = PytorchAccuracy()
    acc = await score(model, scorer, test_source)
    print("\nTesting Accuracy: ", acc)

    # Make Predictions
    print(
        "\n{:>40} \t {:>10} \t {:>10}\n".format(
            "Image filename", "Prediction", "Confidence"
        )
    )
    async for key, features, prediction in predict(model, predict_source):
        print(
            "{:>40} \t {:>10} \t {:>10}".format(
                "rps-predict/" + key,
                prediction["label"]["value"],
                prediction["label"]["confidence"],
            )
        )


if __name__ == "__main__":
    asyncio.run(main())

The output will be as follows:

DEBUG:dffml.util.AsyncContextManagerList.Sources:Entering: DirectorySource(DirectorySourceConfig(foldername=PosixPath('rps'), feature='image', labels=['rock', 'paper', 'scissors'], save=None))
DEBUG:dffml.PNGConfigLoader:BaseConfig()
DEBUG:dffml.DirectorySource:DirectorySource(DirectorySourceConfig(foldername=PosixPath('rps'), feature='image', labels=['rock', 'paper', 'scissors'], save=None)) loaded 2520 records
DEBUG:dffml.util.AsyncContextManagerListContext.SourcesContext:Entering context: <dffml.source.memory.MemorySourceContext object at 0x7f0fe9170c50>
DEBUG:dffml.PyTorchNeuralNetworkContext:cids(3): {0: 'paper', 1: 'rock', 2: 'scissors'}
DEBUG:dffml.PyTorchNeuralNetworkContext:classifications(3): {'paper': 0, 'rock': 1, 'scissors': 2}
DEBUG:dffml.PyTorchNeuralNetworkContext:Loading model with classifications(3): {'paper': 0, 'rock': 1, 'scissors': 2}
DEBUG:dffml.PyTorchNeuralNetworkContext:Model Summary
ConvNet(
  (conv1): Conv2d(3, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv3): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu): ReLU()
  (pooling): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (linear): Linear(in_features=1296, out_features=3, bias=True)
)
DEBUG:dffml.PyTorchNeuralNetworkContext:Training on features: ['image']
INFO:dffml.PyTorchNeuralNetworkContext:------ Record Data ------
INFO:dffml.PyTorchNeuralNetworkContext:x_cols:    2520
INFO:dffml.PyTorchNeuralNetworkContext:y_cols:    2520
INFO:dffml.PyTorchNeuralNetworkContext:-----------------------
INFO:dffml.PyTorchNeuralNetworkContext:Data split into Training samples: 2016 and Validation samples: 504
DEBUG:dffml.PyTorchNeuralNetworkContext:Epoch 1/10
DEBUG:dffml.PyTorchNeuralNetworkContext:----------
DEBUG:dffml.PyTorchNeuralNetworkContext:Training Loss: 1.0265 Acc: 0.4484
DEBUG:dffml.PyTorchNeuralNetworkContext:Validation Loss: 0.6922 Acc: 0.7778
DEBUG:dffml.PyTorchNeuralNetworkContext:
DEBUG:dffml.PyTorchNeuralNetworkContext:Epoch 2/10
DEBUG:dffml.PyTorchNeuralNetworkContext:----------
DEBUG:dffml.PyTorchNeuralNetworkContext:Training Loss: 0.3382 Acc: 0.8988
DEBUG:dffml.PyTorchNeuralNetworkContext:Validation Loss: 0.1446 Acc: 0.9722
DEBUG:dffml.PyTorchNeuralNetworkContext:
DEBUG:dffml.PyTorchNeuralNetworkContext:Epoch 3/10
DEBUG:dffml.PyTorchNeuralNetworkContext:----------
DEBUG:dffml.PyTorchNeuralNetworkContext:Training Loss: 0.0663 Acc: 0.9851
DEBUG:dffml.PyTorchNeuralNetworkContext:Validation Loss: 0.0531 Acc: 0.9901
DEBUG:dffml.PyTorchNeuralNetworkContext:
DEBUG:dffml.PyTorchNeuralNetworkContext:Epoch 4/10
DEBUG:dffml.PyTorchNeuralNetworkContext:----------
DEBUG:dffml.PyTorchNeuralNetworkContext:Training Loss: 0.0332 Acc: 0.9940
DEBUG:dffml.PyTorchNeuralNetworkContext:Validation Loss: 0.0261 Acc: 0.9940
DEBUG:dffml.PyTorchNeuralNetworkContext:
DEBUG:dffml.PyTorchNeuralNetworkContext:Epoch 5/10
DEBUG:dffml.PyTorchNeuralNetworkContext:----------
DEBUG:dffml.PyTorchNeuralNetworkContext:Training Loss: 0.0139 Acc: 0.9955
DEBUG:dffml.PyTorchNeuralNetworkContext:Validation Loss: 0.0060 Acc: 1.0000
DEBUG:dffml.PyTorchNeuralNetworkContext:
INFO:dffml.PyTorchNeuralNetworkContext:Early stopping: Validation Loss didn't improve for 2 consecutive epochs OR maximum accuracy attained.
INFO:dffml.PyTorchNeuralNetworkContext:Training complete in 1m 24s
INFO:dffml.PyTorchNeuralNetworkContext:Best Validation Accuracy: 1.000000

Testing Accuracy:  0.9220430107526881

                          Image filename 	 Prediction 	 Confidence

               rps-predict/scissors7.png 	   scissors 	 0.9990881681442261
                   rps-predict/rock8.png 	       rock 	        1.0
                  rps-predict/paper4.png 	      paper 	 0.7040390372276306
                  rps-predict/paper9.png 	      paper 	 0.5414079427719116
                  rps-predict/paper7.png 	      paper 	 0.9999395608901978
            rps-predict/paper-hires1.png 	      paper 	 0.6969484090805054
         rps-predict/scissors-hires2.png 	   scissors 	 0.9760923385620117
               rps-predict/scissors4.png 	   scissors 	 0.9901471734046936
               rps-predict/scissors8.png 	   scissors 	 0.9570472240447998
             rps-predict/rock-hires1.png 	       rock 	        1.0
            rps-predict/paper-hires2.png 	      paper 	 0.9593355059623718
                  rps-predict/paper8.png 	      paper 	 0.9998843669891357
               rps-predict/scissors2.png 	   scissors 	 0.9512919783592224
                   rps-predict/rock4.png 	       rock 	 0.9999837875366211
                  rps-predict/paper1.png 	      paper 	 0.9846717715263367
               rps-predict/scissors5.png 	   scissors 	 0.9992364645004272
         rps-predict/scissors-hires1.png 	   scissors 	 0.9997430443763733
                  rps-predict/paper2.png 	      paper 	 0.9999908208847046
                   rps-predict/rock1.png 	       rock 	 0.9999294281005859
               rps-predict/scissors3.png 	      paper 	 0.6644117832183838
               rps-predict/scissors1.png 	   scissors 	 0.9635807275772095
               rps-predict/scissors6.png 	   scissors 	 0.9999761581420898
                   rps-predict/rock9.png 	       rock 	 0.9018403887748718
               rps-predict/scissors9.png 	   scissors 	 0.9989670515060425
                  rps-predict/paper6.png 	      paper 	 0.9996334314346313
                  rps-predict/paper3.png 	      paper 	 0.8676062226295471
                  rps-predict/paper5.png 	      paper 	 0.9100719094276428
                   rps-predict/rock7.png 	       rock 	 0.9990749359130859
                   rps-predict/rock5.png 	       rock 	 0.8894972205162048
                   rps-predict/rock3.png 	       rock 	 0.9995730519294739
                   rps-predict/rock2.png 	       rock 	 0.991766631603241
             rps-predict/rock-hires2.png 	       rock 	 0.9999849796295166
                   rps-predict/rock6.png 	       rock 	 0.8994896411895752

The model predicts the hand poses correctly with great confidence!