# Getting Started 1. [Quick Samples](#quick-samples) 2. [Feature Matrix](#feature-matrix) ## Quick Samples ```shell # Install Intel Neural Compressor pip install neural-compressor-pt ``` ```python from transformers import AutoModelForCausalLM from neural_compressor.torch.quantization import RTNConfig, prepare, convert user_model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-125m") quant_config = RTNConfig() prepared_model = prepare(model=user_model, quant_config=quant_config) quantized_model = convert(model=prepared_model) ``` ## Feature Matrix Intel Neural Compressor 3.X extends PyTorch and TensorFlow's APIs to support compression techniques. The below table provides a quick overview of the APIs available in Intel Neural Compressor 3.X. The Intel Neural Compressor 3.X mainly focuses on quantization-related features, especially for algorithms that benefit LLM accuracy and inference. It also provides some common modules across different frameworks. For example, Auto-tune support accuracy driven quantization and mixed precision, benchmark aimed to measure the multiple instances performance of the quantized model.
| Overview | |||||||
|---|---|---|---|---|---|---|---|
| Architecture | Workflow | APIs | LLMs Recipes | Examples | |||
| PyTorch Extension APIs | |||||||
| Overview | Static Quantization | Dynamic Quantization | Smooth Quantization | ||||
| Weight-Only Quantization | MX Quantization | Mixed Precision | |||||
| Tensorflow Extension APIs | |||||||
| Overview | Static Quantization | Smooth Quantization | |||||
| Other Modules | |||||||
| Auto Tune | Benchmark | ||||||