Intel® Extension for Scikit-learn NuSVR for Medical Charges dataset

[1]:

from timeit import default_timer as timer
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from IPython.display import HTML
import warnings

warnings.filterwarnings("ignore")

Download the data

[2]:

x, y = fetch_openml(name="medical_charges_nominal", return_X_y=True)

Preprocessing

Encode categorical features

[3]:

cat_columns = x.select_dtypes(["category"]).columns
x[cat_columns] = x[cat_columns].apply(lambda x: x.cat.codes)

Split the data into train and test sets

[4]:

x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.3, random_state=42)
x_train.shape, x_test.shape, y_train.shape, y_test.shape

[4]:

((48919, 11), (114146, 11), (48919,), (114146,))

Patch original Scikit-learn with Intel® Extension for Scikit-learn

Intel® Extension for Scikit-learn (previously known as daal4py) contains drop-in replacement functionality for the stock Scikit-learn package. You can take advantage of the performance optimizations of Intel® Extension for Scikit-learn by adding just two lines of code before the usual Scikit-learn imports:

[5]:

from sklearnex import patch_sklearn

patch_sklearn()

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)

Intel® Extension for Scikit-learn patching affects performance of specific Scikit-learn functionality. Refer to the list of supported algorithms and parameters for details. In cases when unsupported parameters are used, the package fallbacks into original Scikit-learn. If the patching does not cover your scenarios, submit an issue on GitHub.

Training of the NuSVR algorithm with Intel® Extension for Scikit-learn for Medical Charges dataset

[6]:

from sklearn.svm import NuSVR

params = {
    "nu": 0.4,
    "C": y_train.mean(),
    "degree": 2,
    "kernel": "poly",
}
start = timer()
nusvr = NuSVR(**params).fit(x_train, y_train)
train_patched = timer() - start
f"Intel® extension for Scikit-learn time: {train_patched:.2f} s"

[6]:

'Intel® extension for Scikit-learn time: 24.69 s'

Predict and get a result of the NuSVR algorithm with Intel® Extension for Scikit-learn

[7]:

score_opt = nusvr.score(x_test, y_test)
f"Intel® extension for Scikit-learn R2 score: {score_opt}"

[7]:

'Intel® extension for Scikit-learn R2 score: 0.8635974264586637'

Train the same algorithm with original Scikit-learn

In order to cancel optimizations, we use unpatch_sklearn and reimport the class NuSVR

[8]:

from sklearnex import unpatch_sklearn

unpatch_sklearn()

Training of the NuSVR algorithm with original Scikit-learn library for Medical Charges dataset

[9]:

from sklearn.svm import NuSVR

start = timer()
nusvr = NuSVR(**params).fit(x_train, y_train)
train_unpatched = timer() - start
f"Original Scikit-learn time: {train_unpatched:.2f} s"

[9]:

'Original Scikit-learn time: 331.85 s'

Predict and get a result of the NuSVR algorithm with original Scikit-learn

[10]:

score_original = nusvr.score(x_test, y_test)
f"Original Scikit-learn R2 score: {score_original}"

[10]:

'Original Scikit-learn R2 score: 0.8636031741516902'

[11]:

HTML(
    f"<h3>Compare R2 score of patched Scikit-learn and original</h3>"
    f"R2 score of patched Scikit-learn: {score_opt} <br>"
    f"R2 score of unpatched Scikit-learn: {score_original} <br>"
    f"Metrics ratio: {score_opt/score_original} <br>"
    f"<h3>With Scikit-learn-intelex patching you can:</h3>"
    f"<ul>"
    f"<li>Use your Scikit-learn code for training and prediction with minimal changes (a couple of lines of code);</li>"
    f"<li>Fast execution training and prediction of Scikit-learn models;</li>"
    f"<li>Get the similar quality</li>"
    f"<li>Get speedup in <strong>{(train_unpatched/train_patched):.1f}</strong> times.</li>"
    f"</ul>"
)

[11]:

Compare R2 score of patched Scikit-learn and original

R2 score of patched Scikit-learn: 0.8635974264586637
R2 score of unpatched Scikit-learn: 0.8636031741516902
Metrics ratio: 0.999993344520726

With Scikit-learn-intelex patching you can:

Use your Scikit-learn code for training and prediction with minimal changes (a couple of lines of code);
Fast execution training and prediction of Scikit-learn models;
Get the similar quality
Get speedup in 13.4 times.