Intel® Extension for Scikit-learn KNN for MNIST dataset

[1]:
from timeit import default_timer as timer
from IPython.display import HTML
from sklearn import metrics
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split

Download the data

[2]:
x, y = fetch_openml(name="mnist_784", return_X_y=True)

Split the data into train and test sets

[3]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=72)
x_train.shape, x_test.shape, y_train.shape, y_test.shape
[3]:
((56000, 784), (14000, 784), (56000,), (14000,))

Patch original Scikit-learn with Intel® Extension for Scikit-learn

Intel® Extension for Scikit-learn (previously known as daal4py) contains drop-in replacement functionality for the stock Scikit-learn package. You can take advantage of the performance optimizations of Intel® Extension for Scikit-learn by adding just two lines of code before the usual Scikit-learn imports:

[4]:
from sklearnex import patch_sklearn

patch_sklearn()
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)

Intel® Extension for Scikit-learn patching affects performance of specific Scikit-learn functionality. Refer to the list of supported algorithms and parameters for details. In cases when unsupported parameters are used, the package fallbacks into original Scikit-learn. If the patching does not cover your scenarios, submit an issue on GitHub.

Training and predict KNN algorithm with Intel® Extension for Scikit-learn for MNIST dataset

[5]:
from sklearn.neighbors import KNeighborsClassifier

params = {"n_neighbors": 40, "weights": "distance", "n_jobs": -1}
start = timer()
knn = KNeighborsClassifier(**params).fit(x_train, y_train)
predicted = knn.predict(x_test)
time_opt = timer() - start
f"Intel® extension for Scikit-learn time: {time_opt:.2f} s"
[5]:
'Intel® extension for Scikit-learn time: 1.45 s'
[6]:
report = metrics.classification_report(y_test, predicted)
print(f"Classification report for Intel® extension for Scikit-learn KNN:\n{report}\n")
Classification report for Intel® extension for Scikit-learn KNN:
              precision    recall  f1-score   support

           0       0.97      0.99      0.98      1365
           1       0.93      0.99      0.96      1637
           2       0.99      0.94      0.96      1401
           3       0.96      0.95      0.96      1455
           4       0.98      0.96      0.97      1380
           5       0.95      0.95      0.95      1219
           6       0.96      0.99      0.97      1317
           7       0.94      0.95      0.95      1420
           8       0.99      0.90      0.94      1379
           9       0.92      0.94      0.93      1427

    accuracy                           0.96     14000
   macro avg       0.96      0.96      0.96     14000
weighted avg       0.96      0.96      0.96     14000


The first column of the classification report above is the class labels.

Train the same algorithm with original Scikit-learn

In order to cancel optimizations, we use unpatch_sklearn and reimport the class KNeighborsClassifier.

[7]:
from sklearnex import unpatch_sklearn

unpatch_sklearn()

Training and predict KNN algorithm with original Scikit-learn library for MNSIT dataset

[8]:
from sklearn.neighbors import KNeighborsClassifier


start = timer()
knn = KNeighborsClassifier(**params).fit(x_train, y_train)
predicted = knn.predict(x_test)
time_original = timer() - start
f"Original Scikit-learn time: {time_original:.2f} s"
[8]:
'Original Scikit-learn time: 36.15 s'
[9]:
report = metrics.classification_report(y_test, predicted)
print(f"Classification report for original Scikit-learn KNN:\n{report}\n")
Classification report for original Scikit-learn KNN:
              precision    recall  f1-score   support

           0       0.97      0.99      0.98      1365
           1       0.93      0.99      0.96      1637
           2       0.99      0.94      0.96      1401
           3       0.96      0.95      0.96      1455
           4       0.98      0.96      0.97      1380
           5       0.95      0.95      0.95      1219
           6       0.96      0.99      0.97      1317
           7       0.94      0.95      0.95      1420
           8       0.99      0.90      0.94      1379
           9       0.92      0.94      0.93      1427

    accuracy                           0.96     14000
   macro avg       0.96      0.96      0.96     14000
weighted avg       0.96      0.96      0.96     14000


[10]:
HTML(
    f"<h2>With scikit-learn-intelex patching you can:</h2>"
    f"<ul>"
    f"<li>Use your Scikit-learn code for training and prediction with minimal changes (a couple of lines of code);</li>"
    f"<li>Fast execution training and prediction of Scikit-learn models;</li>"
    f"<li>Get the similar quality</li>"
    f"<li>Get speedup in <strong>{(time_original/time_opt):.1f}</strong> times.</li>"
    f"</ul>"
)
[10]:

With scikit-learn-intelex patching you can:

  • Use your Scikit-learn code for training and prediction with minimal changes (a couple of lines of code);
  • Fast execution training and prediction of Scikit-learn models;
  • Get the similar quality
  • Get speedup in 24.9 times.