Kaggle Kernels for Classification Tasks
The following Kaggle kernels show how to patch scikit-learn with Intel® Extension for Scikit-learn* for various classification tasks. These kernels usually include a performance comparison between stock scikit-learn and scikit-learn patched with Intel® Extension for Scikit-learn*.
TPS stands for Tabular Playground Series, which is a series of beginner-friendly Kaggle competitions.
Binary Classification
Kernel |
Goal |
Content |
---|---|---|
Logistic Regression for Binary Classification Data: [TPS Nov 2021] Synthetic spam emails data |
Identify spam emails via features extracted from the email |
|
Feature Importance in Random Forest for Binary Classification Data: [TPS Nov 2021] Synthetic spam emails data |
Identify spam emails via features extracted from the email |
|
Random Forest for Binary Classification Data: [TPS Apr 2021] Synthetic data based on Titanic dataset |
Predict whether a passenger survivies |
|
Support Vector Classification (SVC) for Binary Classification Data: [TPS Apr 2021] Synthetic data based on Titanic dataset |
Predict whether a passenger survivies |
|
Support Vector Classification (SVC) with Feature Preprocessing for Binary Classification Data: [TPS Apr 2021] Synthetic data based on Titanic dataset |
Predict whether a passenger survivies |
|
MultiClass Classification
Kernel |
Goal |
Content |
---|---|---|
Logistic Regression for MultiClass Classification with Quantile Transformer Data: [TPS Jun 2021] Synthetic eCommerce data |
Predict the category of an eCommerce product |
|
Support Vector Classification (SVC) for MultiClass Classification Data: [TPS May 2021] Synthetic eCommerce data |
Predict the category of an eCommerce product |
|
Stacking Classifer with Logistic Regression, kNN, Random Forest, and Quantile Transformer Data: [TPS Jun 2021] Synthetic eCommerce data |
Predict the category of an eCommerce product |
|
Support Vector Classification (SVC) for MultiClass Classification Data: [TPS Dec 2021] Synthetic Forest Cover Type data |
Predict the forest cover type |
|
Feature Importance in Random Forest for MultiClass Classification Data: [TPS Dec 2021] Synthetic Forest Cover Type data |
Predict the forest cover type |
|
k-Nearest Neighbors (kNN) for MultiClass Classification Data: [TPS Feb 2022] Bacteria DNA |
Predict bacteria species based on repeated lossy measurements of DNA snippets |
|
Classification Tasks in Computer Vision
Kernel |
Goal |
Content |
---|---|---|
Support Vector Classification (SVC) for MultiClass Classification (CV task) Data: Digit Recognizer (MNIST) |
Recognize hand-written digits |
|
k-Nearest Neighbors (kNN) for MultiClass Classification (CV task) Data: Digit Recognizer (MNIST) |
Recognize hand-written digits |
|
Classification Tasks in Natural Language Processing
Kernel |
Goal |
Content |
---|---|---|
Support Vector Classification (SVC) for a Binary Classification (NLP task) Data: Natural Language Processing with Disaster Tweets |
Predict which tweets are about real disasters and which ones are not |
|
One-vs-Rest Support Vector Machine (SVM) with Text Data for MultiClass Classification Data: What’s Cooking |
Use recipe ingredients to predict the cuisine |
|
Support Vector Classification (SVC) for Binary Classification with Sparse Data (NLP task) Data: Stack Overflow questions |
Predict the binary quality rating for Stack Overflow questions |
|