# Intel® Extension for Scikit-learn Kmeans for spoken arabic digit dataset

```
[1]:
```

```
from timeit import default_timer as timer
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_openml
from IPython.display import HTML
import warnings
warnings.filterwarnings("ignore")
```

## Download the data

```
[2]:
```

```
x, y = fetch_openml(name="spoken-arabic-digit", return_X_y=True)
```

## Preprocessing

Split the data into train and test sets

```
[3]:
```

```
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1, random_state=123)
x_train.shape, x_test.shape, y_train.shape, y_test.shape
```

```
[3]:
```

```
((236930, 14), (26326, 14), (236930,), (26326,))
```

Normalize the data

```
[4]:
```

```
from sklearn.preprocessing import MinMaxScaler
scaler_x = MinMaxScaler()
```

```
[5]:
```

```
scaler_x.fit(x_train)
x_train = scaler_x.transform(x_train)
x_test = scaler_x.transform(x_test)
```

## Patch original Scikit-learn with Intel® Extension for Scikit-learn

Intel® Extension for Scikit-learn (previously known as daal4py) contains drop-in replacement functionality for the stock Scikit-learn package. You can take advantage of the performance optimizations of Intel® Extension for Scikit-learn by adding just two lines of code before the usual Scikit-learn imports:

```
[6]:
```

```
from sklearnex import patch_sklearn
patch_sklearn()
```

```
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
```

Intel® Extension for Scikit-learn patching affects performance of specific Scikit-learn functionality. Refer to the list of supported algorithms and parameters for details. In cases when unsupported parameters are used, the package fallbacks into original Scikit-learn. If the patching does not cover your scenarios, submit an issue on GitHub.

Training of the KMeans algorithm with Intel® Extension for Scikit-learn for spoken arabic digit dataset

```
[7]:
```

```
from sklearn.cluster import KMeans
params = {
"n_clusters": 128,
"random_state": 123,
"copy_x": False,
}
start = timer()
model = KMeans(**params).fit(x_train, y_train)
train_patched = timer() - start
f"Intel® extension for Scikit-learn time: {train_patched:.2f} s"
```

```
[7]:
```

```
'Intel® extension for Scikit-learn time: 7.36 s'
```

Let’s take a look at inertia and number of iterations of the KMeans algorithm with Intel® Extension for Scikit-learn

```
[8]:
```

```
inertia_opt = model.inertia_
n_iter_opt = model.n_iter_
print(f"Intel® extension for Scikit-learn inertia: {inertia_opt}")
print(f"Intel® extension for Scikit-learn number of iterations: {n_iter_opt}")
```

```
Intel® extension for Scikit-learn inertia: 13346.641333761074
Intel® extension for Scikit-learn number of iterations: 274
```

## Train the same algorithm with original Scikit-learn

In order to cancel optimizations, we use *unpatch_sklearn* and reimport the class KMeans

```
[9]:
```

```
from sklearnex import unpatch_sklearn
unpatch_sklearn()
```

Training of the KMeans algorithm with original Scikit-learn library for spoken arabic digit dataset

```
[10]:
```

```
from sklearn.cluster import KMeans
start = timer()
model = KMeans(**params).fit(x_train, y_train)
train_unpatched = timer() - start
f"Original Scikit-learn time: {train_unpatched:.2f} s"
```

```
[10]:
```

```
'Original Scikit-learn time: 192.14 s'
```

Let’s take a look at inertia and number of iterations of the KMeans algorithm with original Scikit-learn

```
[11]:
```

```
inertia_original = model.inertia_
n_iter_original = model.n_iter_
print(f"Original Scikit-learn inertia: {inertia_original}")
print(f"Original Scikit-learn number of iterations: {n_iter_original}")
```

```
Original Scikit-learn inertia: 13352.813785961785
Original Scikit-learn number of iterations: 212
```

```
[12]:
```

```
HTML(
f"<h3>Compare inertia and number of iterations of patched Scikit-learn and original</h3><br>"
f"<strong>Inertia:</strong><br>"
f"Patched Scikit-learn: {inertia_opt} <br>"
f"Unpatched Scikit-learn: {inertia_original} <br>"
f"Ratio: {inertia_opt/inertia_original} <br><br>"
f"<strong>Number of iterations:</strong><br>"
f"Patched Scikit-learn: {n_iter_opt} <br>"
f"Unpatched Scikit-learn: {n_iter_original} <br>"
f"Ratio: {(n_iter_opt/n_iter_original):.2f} <br><br>"
f"Number of iterations is bigger but algorithm is much faster and inertia is lower"
f"<h3>With Scikit-learn-intelex patching you can:</h3>"
f"<ul>"
f"<li>Use your Scikit-learn code for training and prediction with minimal changes (a couple of lines of code);</li>"
f"<li>Fast execution training and prediction of Scikit-learn models;</li>"
f"<li>Get speedup in <strong>{(train_unpatched/train_patched):.1f}</strong> times.</li>"
f"</ul>"
)
```

```
[12]:
```

### Compare inertia and number of iterations of patched Scikit-learn and original

**Inertia:**

Patched Scikit-learn: 13346.641333761074

Unpatched Scikit-learn: 13352.813785961785

Ratio: 0.9995377414603653

**Number of iterations:**

Patched Scikit-learn: 274

Unpatched Scikit-learn: 212

Ratio: 1.29

Number of iterations is bigger but algorithm is much faster and inertia is lower

### With Scikit-learn-intelex patching you can:

- Use your Scikit-learn code for training and prediction with minimal changes (a couple of lines of code);
- Fast execution training and prediction of Scikit-learn models;
- Get speedup in
**26.1**times.