Kaggle Kernels for Regression Tasks

The following Kaggle kernels show how to patch scikit-learn with Intel® Extension for Scikit-learn* for various regression tasks. These kernels usually include a performance comparison between stock scikit-learn and scikit-learn patched with Intel® Extension for Scikit-learn*.

TPS stands for Tabular Playground Series, which is a series of beginner-friendly Kaggle competitions.

Using a Single Regressor

Kernel

Goal

Content

Baseline Nu Support Vector Regression (nuSVR) with RBF Kernel

Data: [TPS Jul 2021] Synthetic pollution data

Predict air pollution measurements over time based on weather and input values from multiple sensors

  • data preprocessing

  • search for optimal paramters using Optuna

  • training and prediction using scikit-learn-intelex

Nu Support Vector Regression (nuSVR)

Data: [TPS Aug 2021] Synthetic loan data

Calculate loss associated with a loan defaults

  • data preprocessing

  • feature engineering

  • training and prediction using scikit-learn-intelex

  • performance comparison to scikit-learn

Nu Support Vector Regression (nuSVR)

Data: House Prices dataset

Predict sale prices for a property based on its characteristics

  • data preprocessing

  • exploring outliers

  • feature engineering

  • filling missing values

  • search for optimal parameters using Optuna

  • training and prediction using scikit-learn-intelex

  • performance comparison to scikit-learn

Random Forest Regression

Data: [TPS Jul 2021] Synthetic pollution data

Predict air pollution measurements over time based on weather and input values from multiple sensors

  • checking correlation between features

  • search for best paramters using GridSearchCV

  • training and prediction using scikit-learn-intelex

  • performance comparison to scikit-learn

Random Forest Regression with Feature Engineering

Data: [TPS Jul 2021] Synthetic pollution data

Predict air pollution measurements over time based on weather and input values from multiple sensors

  • data preprocessing

  • feature engineering

  • search for optimal parameters using Optuna

  • training and prediction using scikit-learn-intelex

  • performance comparison to scikit-learn

Random Forest Regression with Feature Importance Computation

Data: [TPS Mar 2022] Spatio-temporal traffic data

Forecast twelve-hours of traffic flow in a major U.S. metropolitan area

  • feature engineering

  • computing feature importance with ELI5

  • training and prediction using scikit-learn-intelex

  • performance comparison to scikit-learn

Ridge Regression

Data: [TPS Sep 2021] Synthetic insurance data

Predict the probability of a customer making a claim upon an insurance policy

  • data preprocessing

  • filling missing values

  • search for optimal parameters using Optuna

  • training and prediction using scikit-learn-intelex

  • performance comparison to scikit-learn

Stacking Regressors

Kernel

Goal

Content

Stacking Regressor with Random Fores, SVR, and LASSO

Data: [TPS Jul 2021] Synthetic pollution data

Predict air pollution measurements over time based on weather and input values from multiple sensors

  • feature engineering

  • creating a stacking regressor

  • search for optimal parameters using Optuna

  • training and prediction using scikit-learn-intelex

  • performance comparison to scikit-learn

Stacking Regressor with ElasticNet, LASSO, and Ridge Regression for Time-series data

Data: Predict Future Sales dataset

Predict total sales for every product and store in the next month based on daily sales data

  • data preprocessing

  • creating a stacking regressor

  • search for optimal parameters using Optuna

  • training and prediction using scikit-learn-intelex

  • performance comparison to scikit-learn