Developer Documentation

Read the following material as you learn how to use Neural Compressor.

Get Started

  • Transform introduces how to utilize Neural Compressor’s built-in data processing and how to develop a custom data processing method.

  • Dataset introduces how to utilize Neural Compressor’s built-in dataset and how to develop a custom dataset.

  • Metrics introduces how to utilize Neural Compressor’s built-in metrics and how to develop a custom metric.

  • UX is a web-based system used to simplify Neural Compressor usage.

  • Intel oneAPI AI Analytics Toolkit Get Started Guide explains the AI Kit components, installation and configuration guides, and instructions for building and running sample apps.

  • AI and Analytics Samples includes code samples for Intel oneAPI libraries.

Deep Dive

  • Quantization are processes that enable inference and training by performing computations at low-precision data types, such as fixed-point integers. Neural Compressor supports Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT). Note that Dynamic Quantization currently has limited support.

  • Pruning provides a common method for introducing sparsity in weights and activations.

  • Benchmarking introduces how to utilize the benchmark interface of Neural Compressor.

  • Mixed precision introduces how to enable mixed precision, including BFP16 and int8 and FP32, on Intel platforms during tuning.

  • Graph Optimization introduces how to enable graph optimization for FP32 and auto-mixed precision.

  • Model Conversion <model_conversion.md> introduces how to convert TensorFlow QAT model to quantized model running on Intel platforms.

  • TensorBoard provides tensor histograms and execution graphs for tuning debugging purposes.

Advanced Topics

  • Adaptor is the interface between Neural Compressor and framework. The method to develop adaptor extension is introduced with ONNX Runtime as example.

  • Tuning strategies can automatically optimized low-precision recipes for deep learning models to achieve optimal product objectives like inference performance and memory usage with expected accuracy criteria. The method to develop a new strategy is introduced.