User Guide
Intel® Neural Compressor aims to provide popular model compression techniques such as quantization, pruning (sparsity), distillation, and neural architecture search to help the user optimize their model. The below documents could help you to get familiar with concepts and modules in Intel® Neural Compressor. Learn how to utilize the APIs in Intel® Neural Compressor to conduct quantization, pruning (sparsity), distillation, and neural architecture search on mainstream frameworks.
Overview
This part helps user to get a quick understand about design structure and workflow of Intel® Neural Compressor. We provided broad examples to help users get started.
Architecture | Workflow | APIs | ||||||
Notebook | Examples | Results | Intel oneAPI AI Analytics Toolkit |
Python-based APIs
Python-based APIs contains more details about the functional APIs in Intel® Neural Compressor,
which introduce the mechanism of each function and provides a tutorial to help the user apply in their own cases.
Please note that we will stop to support Intel Neural Compressor 1.X API in the future.
So we provide a comprehensive migration document in Code Migration to help the user update their code from previous 1.X version to the new 2.X version.
In 2.X API, it’s very important to create the DataLoader
and Metrics
for your examples, so we provide the detail introductions.
Advanced Topics
This part provides the advanced topics that help user dive deep into Intel® Neural Compressor.
Adaptor | Strategy | Objective | Calibration | ||||||||
Add New Data Type | Add New Adaptor | ||||||||||
Distillation for Quantization | SmoothQuant | Weight-Only Quantization | Layer-Wise Quantization |