Intel® Extension for Scikit-learn Logistic Regression for Cifar dataset

[1]:
from timeit import default_timer as timer
from sklearn import metrics
from sklearn.model_selection import train_test_split
import warnings
from IPython.display import HTML

warnings.filterwarnings("ignore")

Download the data

[2]:
from sklearn.datasets import fetch_openml

x, y = fetch_openml(name="CIFAR-100", return_X_y=True)

Split the data into train and test sets

[3]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1, random_state=43)
x_train.shape, x_test.shape, y_train.shape
[3]:
((54000, 3072), (6000, 3072), (54000,))

Patch original Scikit-learn with Intel® Extension for Scikit-learn

Intel® Extension for Scikit-learn (previously known as daal4py) contains drop-in replacement functionality for the stock Scikit-learn package. You can take advantage of the performance optimizations of Intel® Extension for Scikit-learn by adding just two lines of code before the usual Scikit-learn imports:

[4]:
from sklearnex import patch_sklearn

patch_sklearn()
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)

Intel® Extension for Scikit-learn patching affects performance of specific Scikit-learn functionality. Refer to the list of supported algorithms and parameters for details. In cases when unsupported parameters are used, the package fallbacks into original Scikit-learn. If the patching does not cover your scenarios, submit an issue on GitHub.

Training of the Logistic Regression algorithm with Intel® Extension for Scikit-learn for CIFAR dataset

[5]:
from sklearn.linear_model import LogisticRegression

params = {
    "C": 0.1,
    "solver": "lbfgs",
    "multi_class": "multinomial",
    "n_jobs": -1,
}
start = timer()
classifier = LogisticRegression(**params).fit(x_train, y_train)
train_patched = timer() - start
f"Intel® extension for Scikit-learn time: {train_patched:.2f} s"
[5]:
'Intel® extension for Scikit-learn time: 24.82 s'

Predict probability and get a result of the Logistic Regression algorithm with Intel® Extension for Scikit-learn

[6]:
y_predict = classifier.predict_proba(x_test)
log_loss_opt = metrics.log_loss(y_test, y_predict)
f"Intel® extension for Scikit-learn Log Loss: {log_loss_opt} s"
[6]:
'Intel® extension for Scikit-learn Log Loss: 3.7073530800931587 s'

Train the same algorithm with original Scikit-learn

In order to cancel optimizations, we use unpatch_sklearn and reimport the class LogisticRegression

[7]:
from sklearnex import unpatch_sklearn

unpatch_sklearn()

Training of the Logistic Regression algorithm with original Scikit-learn library for CIFAR dataset

[8]:
from sklearn.linear_model import LogisticRegression

start = timer()
classifier = LogisticRegression(**params).fit(x_train, y_train)
train_unpatched = timer() - start
f"Original Scikit-learn time: {train_unpatched:.2f} s"
[8]:
'Original Scikit-learn time: 395.03 s'

Predict probability and get a result of the Logistic Regression algorithm with original Scikit-learn

[9]:
y_predict = classifier.predict_proba(x_test)
log_loss_original = metrics.log_loss(y_test, y_predict)
f"Original Scikit-learn Log Loss: {log_loss_original} s"
[9]:
'Original Scikit-learn Log Loss: 3.7140870590578428 s'
[10]:
HTML(
    f"<h3>Compare Log Loss metric of patched Scikit-learn and original</h3>"
    f"Log Loss metric of patched Scikit-learn: {log_loss_opt} <br>"
    f"Log Loss metric of unpatched Scikit-learn: {log_loss_original} <br>"
    f"Metrics ratio: {log_loss_opt/log_loss_original} <br>"
    f"<h3>With Scikit-learn-intelex patching you can:</h3>"
    f"<ul>"
    f"<li>Use your Scikit-learn code for training and prediction with minimal changes (a couple of lines of code);</li>"
    f"<li>Fast execution training and prediction of Scikit-learn models;</li>"
    f"<li>Get the similar quality</li>"
    f"<li>Get speedup in <strong>{(train_unpatched/train_patched):.1f}</strong> times.</li>"
    f"</ul>"
)
[10]:

Compare Log Loss metric of patched Scikit-learn and original

Log Loss metric of patched Scikit-learn: 3.7073530800931587
Log Loss metric of unpatched Scikit-learn: 3.7140870590578428
Metrics ratio: 0.9981869086917978

With Scikit-learn-intelex patching you can:

  • Use your Scikit-learn code for training and prediction with minimal changes (a couple of lines of code);
  • Fast execution training and prediction of Scikit-learn models;
  • Get the similar quality
  • Get speedup in 15.9 times.