Intel® Extension for Scikit-learn Linear Regression for YearPredictionMSD dataset
[1]:
from timeit import default_timer as timer
from sklearn import metrics
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import os
import requests
import warnings
from IPython.display import HTML
warnings.filterwarnings("ignore")
Download the data
[2]:
dataset_dir = "data"
dataset_name = "year_prediction_msd"
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00203/YearPredictionMSD.txt.zip"
os.makedirs(dataset_dir, exist_ok=True)
local_url = os.path.join(dataset_dir, os.path.basename(url))
if not os.path.isfile(local_url):
response = requests.get(url, stream=True)
with open(local_url, "wb+") as file:
for data in response.iter_content(8192):
file.write(data)
year = pd.read_csv(local_url, header=None)
x = year.iloc[:, 1:].to_numpy(dtype=np.float32)
y = year.iloc[:, 0].to_numpy(dtype=np.float32)
Split the data into train and test sets
[3]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1, random_state=0)
x_train.shape, x_test.shape, y_train.shape, y_test.shape
[3]:
((463810, 90), (51535, 90), (463810,), (51535,))
Normalize the data
[4]:
from sklearn.preprocessing import MinMaxScaler, StandardScaler
scaler_x = MinMaxScaler()
scaler_y = StandardScaler()
[5]:
scaler_x.fit(x_train)
x_train = scaler_x.transform(x_train)
x_test = scaler_x.transform(x_test)
scaler_y.fit(y_train.reshape(-1, 1))
y_train = scaler_y.transform(y_train.reshape(-1, 1)).ravel()
y_test = scaler_y.transform(y_test.reshape(-1, 1)).ravel()
Patch original Scikit-learn with Intel® Extension for Scikit-learn
Intel® Extension for Scikit-learn (previously known as daal4py) contains drop-in replacement functionality for the stock Scikit-learn package. You can take advantage of the performance optimizations of Intel® Extension for Scikit-learn by adding just two lines of code before the usual Scikit-learn imports:
[6]:
from sklearnex import patch_sklearn
patch_sklearn()
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
Intel® Extension for Scikit-learn patching affects performance of specific Scikit-learn functionality. Refer to the list of supported algorithms and parameters for details. In cases when unsupported parameters are used, the package fallbacks into original Scikit-learn. If the patching does not cover your scenarios, submit an issue on GitHub.
Training of the Linear Regression algorithm with Intel® Extension for Scikit-learn for YearPredictionMSD dataset
[7]:
from sklearn.linear_model import LinearRegression
params = {"n_jobs": -1, "copy_X": False}
start = timer()
model = LinearRegression(**params).fit(x_train, y_train)
train_patched = timer() - start
f"Intel® extension for Scikit-learn time: {train_patched:.2f} s"
[7]:
'Intel® extension for Scikit-learn time: 0.03 s'
Predict and get a result of the Linear Regression algorithm with Intel® Extension for Scikit-learn
[8]:
y_predict = model.predict(x_test)
mse_metric_opt = metrics.mean_squared_error(y_test, y_predict)
f"Patched Scikit-learn MSE: {mse_metric_opt}"
[8]:
'Patched Scikit-learn MSE: 0.7716818451881409'
Train the same algorithm with original Scikit-learn
In order to cancel optimizations, we use unpatch_sklearn and reimport the class LinearRegression
[9]:
from sklearnex import unpatch_sklearn
unpatch_sklearn()
Training of the Linear Regression algorithm with original Scikit-learn library for YearPredictionMSD dataset
[10]:
from sklearn.linear_model import LinearRegression
start = timer()
model = LinearRegression(**params).fit(x_train, y_train)
train_unpatched = timer() - start
f"Original Scikit-learn time: {train_unpatched:.2f} s"
[10]:
'Original Scikit-learn time: 0.53 s'
Predict and get a result of the Linear Regression algorithm with original Scikit-learn
[11]:
y_predict = model.predict(x_test)
mse_metric_original = metrics.mean_squared_error(y_test, y_predict)
f"Original Scikit-learn MSE: {mse_metric_original}"
[11]:
'Original Scikit-learn MSE: 0.7716856598854065'
[12]:
HTML(
f"<h3>Compare MSE metric of patched Scikit-learn and original</h3>"
f"MSE metric of patched Scikit-learn: {mse_metric_opt} <br>"
f"MSE metric of unpatched Scikit-learn: {mse_metric_original} <br>"
f"Metrics ratio: {mse_metric_opt/mse_metric_original} <br>"
f"<h3>With Scikit-learn-intelex patching you can:</h3>"
f"<ul>"
f"<li>Use your Scikit-learn code for training and prediction with minimal changes (a couple of lines of code);</li>"
f"<li>Fast execution training and prediction of Scikit-learn models;</li>"
f"<li>Get the similar quality</li>"
f"<li>Get speedup in <strong>{(train_unpatched/train_patched):.1f}</strong> times.</li>"
f"</ul>"
)
[12]:
Compare MSE metric of patched Scikit-learn and original
MSE metric of patched Scikit-learn: 0.7716818451881409MSE metric of unpatched Scikit-learn: 0.7716856598854065
Metrics ratio: 0.9999950528144836
With Scikit-learn-intelex patching you can:
- Use your Scikit-learn code for training and prediction with minimal changes (a couple of lines of code);
- Fast execution training and prediction of Scikit-learn models;
- Get the similar quality
- Get speedup in 18.4 times.