FLOWER17 Species classification using OpenCV and Scikit-Learn

The model we’ll be using is the RandomForestClassifier which is a part of dffml-model-scikit, a DFFML plugin which allows you to use Scikit-Learn via DFFML. We’ll also be using the various operations from dffml-operations-image, which is a DFFML plugin which allows you to do image processing via DFFML and dffml-config-yaml and dffml-config-image for loading yaml and image files. We can install them with pip.

$ pip install -U dffml-model-scikit dffml-operations-image dffml-config-yaml dffml-config-image

Create a dataflow config file which will be used by the DataFlowPreprocessSource to preprocess the data before feeding it to the model.

Our dataflow extracts features from the data which will then be used by the machine learning model. Since the dataset contains flowers, we extract features that quantify the color, shape and texture of each flower image.

We use the following set of operations from the dffml_operations_image plugin which use functions provided by OpenCV library, which the dataflow will run through to extract features.

resize

This operation resizes the images up or down to the specified size.

convert_color

This operation uses the OpenCV function cv2.cvtColor and converts the image from one colorspace to another.

calcHist

This operation extracts the histogram from the image array and is used in this tutorial to quantify the colors present in a flower image.

normalize

This operation will be used to normalize the value range of the histogram here.

HuMoments

This operation calculates the seven Hu invariants which quantifies the information regarding the orientation of the image. Image Moments

Haralick

This operation extracts the texture features from the image.

flatten

This operation returns a copy of numpy array collapsed into 1 dimension.

The dataflow visualised:

../../_images/dataflow_diagram1.svg

We then create the dataflow config file, using the dataflow create command.

dffml dataflow \
    create \
        resize \
        calcHist \
        Haralick \
        HuMoments \
        gray=convert_color \
        hsv=convert_color \
        flatten \
        normalize \
        get_single \
    -configloader yaml \
    -inputs \
        '[500,500,3]'=resize.inputs.dsize \
        '[0,1,2]'=calcHist.inputs.channels \
        'None'=calcHist.inputs.mask \
        '[8,8,8]'=calcHist.inputs.histSize \
        '[0,256,0,256,0,256]'=calcHist.inputs.ranges \
        BGR2GRAY=convert_color.inputs.code=gray \
        BGR2HSV=convert_color.inputs.code=hsv \
        '[{"Histogram": "flatten.outputs.result"},{"HuMoments": "HuMoments.outputs.result"},{"Haralick": "Haralick.outputs.result"}]'=get_single_spec \
    -flow \
        '[{"seed": ["image"]}]'=resize.inputs.src \
        '[{"resize": "result"}]'=hsv.inputs.src \
        '["hsv"]'=hsv.inputs.code \
        '[{"hsv": "result"}]'=calcHist.inputs.images \
        '[{"calcHist": "result"}]'=normalize.inputs.src \
        '[{"normalize": "result"}]'=flatten.inputs.array \
        '[{"resize": "result"}]'=gray.inputs.src \
        '["gray"]'=gray.inputs.code \
        '[{"gray": "result"}]'=HuMoments.inputs.m \
        '[{"resize": "result"}]'=Haralick.inputs.f |
    tee features.yaml

To re-create the visualization of the dataflow above, run:

dffml dataflow diagram features.yaml -simple -stages processing -configloader yaml

Copy and pasting the output of the above code into the mermaidjs live editor results in the graph.

Train the model.

dffml train \
  -model scikitrfc \
  -model-features \
    Histogram:int:$((8*8*8)) \
    HuMoments:int:7 \
    Haralick:int:13 \
  -model-predict label:str:1 \
  -model-location tempdir \
  -sources images=dfpreprocess \
    -source-images-source dir \
    -source-images-source-foldername flower_dataset/train \
    -source-images-source-labels \
      crocus windflower fritillary tulip pansy dandelion tigerlily sunflower \
      bluebell cowslip coltsfoot snowdrop daffodil lilyvalley iris buttercup daisy \
    -source-images-source-feature image \
    -source-images-dataflow features.yaml \
    -source-images-features image:int:$((500*500)) \
  -log critical

Assess the model’s accuracy.

dffml accuracy \
  -scorer skmodelscore \
  -model scikitrfc \
  -model-features \
    Histogram:int:$((8*8*8)) \
    HuMoments:int:7 \
    Haralick:int:13 \
  -model-predict label:str:1 \
  -model-location tempdir \
  -features label:str:1 \
  -sources images=dfpreprocess \
    -source-images-source dir \
    -source-images-source-foldername flower_dataset/test \
    -source-images-source-labels \
      crocus windflower fritillary tulip pansy dandelion tigerlily sunflower \
      bluebell cowslip coltsfoot snowdrop daffodil lilyvalley iris buttercup daisy \
    -source-images-source-feature image \
    -source-images-dataflow features.yaml \
    -source-images-features image:int:$((500*500)) \
  -log critical

The output is:

0.27450980392156865

Create an unknown_images.csv file which contains the filenames of the images to predict on.

cat > unknown_images.csv << EOF
key,image
daisy,daisy.jpg
pansy,pansy.jpg
tigerlily,tigerlily.jpg
buttercup,buttercup.jpg
EOF

In this example, the unknown_images.csv file contains the filenames of the following images

../examples/flower17/daisy.jpg ../examples/flower17/pansy.jpg ../examples/flower17/tigerlily.jpg ../examples/flower17/buttercup.jpg

Predict with the trained model.

dffml predict all \
  -model scikitrfc \
  -model-features \
    Histogram:int:$((8*8*8)) \
    HuMoments:int:7 \
    Haralick:int:13 \
  -model-predict label:str:1 \
  -model-location tempdir \
  -sources images=dfpreprocess \
    -source-images-source csv \
    -source-images-source-filename unknown_images.csv \
    -source-images-source-loadfiles image \
    -source-images-dataflow features.yaml \
    -source-images-features image:int:$((500*500)) \
  -log critical \
  -pretty

Output


    Key:	daisy
                                                               Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
|               image               |                     [[111, 126, 128], [110, 125, 127], [109, 124,  ... (length:442)                      |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|             Histogram             |                     0.45565221302156844, 0.005917561208072317, 0.0 ... (length:512)                      |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|             HuMoments             |                      0.0009669592943914033, 1.0846416009082036e-09, ... (length:7)                       |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|              Haralick             |                      0.0002911835961447346, 805.9875298101111, 0.89 ... (length:13)                      |
+----------------------------------------------------------------------------------------------------------------------------------------------+

                                                                  Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                    label                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|           Value:  daisy           |                                    Confidence:   0.27450980392156865                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+

	Key:	pansy
                                                               Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
|               image               |                     [[153, 92, 136], [120, 59, 103], [112, 51, 95] ... (length:480)                      |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|             Histogram             |                     0.07277950023229314, 0.0036389750116146567, 0. ... (length:512)                      |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|             HuMoments             |                      0.0012999255874069954, 6.64033598556376e-09, 5 ... (length:7)                       |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|              Haralick             |                      0.00023571813835279273, 5094.983351862911, 0.5 ... (length:13)                      |
+----------------------------------------------------------------------------------------------------------------------------------------------+

                                                                  Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                    label                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|         Value:  sunflower         |                                    Confidence:   0.27450980392156865                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+

	Key:	tigerlily
                                                               Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
|               image               |                     [[0, 15, 1], [0, 15, 1], [0, 15, 1], [0, 15, 1 ... (length:426)                      |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|             Histogram             |                     0.2058240085942461, 0.0, 0.0, 0.0, 0.0, 0.0, 0 ... (length:512)                      |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|             HuMoments             |                      0.0017531570284714357, 1.2844091501399027e-08, ... (length:7)                       |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|              Haralick             |                      0.000361688295517474, 2520.9790741688075, 0.64 ... (length:13)                      |
+----------------------------------------------------------------------------------------------------------------------------------------------+

                                                                  Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                    label                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|         Value:  dandelion         |                                    Confidence:   0.27450980392156865                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+

	Key:	buttercup
                                                               Record Features
+----------------------------------------------------------------------------------------------------------------------------------------------+
|               image               |                     [[10, 18, 17], [10, 18, 17], [10, 18, 17], [10 ... (length:424)                      |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|             Histogram             |                     0.03706228906355396, 0.0030885240886294966, 0. ... (length:512)                      |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|             HuMoments             |                      0.0013087594154773934, 9.914047463163542e-08,  ... (length:7)                       |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|              Haralick             |                      0.0019601722890134747, 6277.732845406246, 0.51 ... (length:13)                      |
+----------------------------------------------------------------------------------------------------------------------------------------------+

                                                                  Prediction
+----------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                    label                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+
|         Value:  buttercup         |                                    Confidence:   0.27450980392156865                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------+

The model predicts 2 of the 4 images correctly!

The accuracy of the model can be increased by using various preprocessing techniques for example scaling all the feature vectors within the range [0,1].

Alternatively using Convolutional Neural Networks vastly improves the accuracy! - The Transfer Learning approach with Convolution Neural Networks