## LLMs Quantization Recipes Intel® Neural Compressor supported advanced large language models (LLMs) quantization technologies including SmoothQuant (SQ) and Weight-Only Quant (WOQ), and verified a list of LLMs on 4th Gen Intel® Xeon® Scalable Processor (codenamed Sapphire Rapids) with [PyTorch](https://pytorch.org/), [Intel® Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch) and [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers). This document aims to publish the specific recipes we achieved for the popular LLMs and help users to quickly get an optimized LLM with limited 1% accuracy loss. > Notes: > > - The quantization algorithms provide by [Intel® Neural Compressor](https://github.com/intel/neural-compressor) and the evaluate functions provide by [Intel® Extension for Transformers](https://github.com/intel/intel-extension-for-transformers). > - The model list are continuing update, please expect to find more LLMs in the future. ## Large Language Models Recipes | Models | SQ INT8 | WOQ INT8 | WOQ INT4 | | :-----------------------------: | :-----: | :------: | :------: | | EleutherAI/gpt-j-6b | ✔ | ✔ | ✔ | | facebook/opt-1.3b | ✔ | ✔ | ✔ | | facebook/opt-30b | ✔ | ✔ | ✔ | | meta-llama/Llama-2-7b-hf | WIP | ✔ | ✔ | | meta-llama/Llama-2-13b-hf | WIP | ✔ | ✔ | | meta-llama/Llama-2-70b-hf | ✔ | ✔ | ✔ | | tiiuae/falcon-7b | ✔ | ✔ | ✔ | | tiiuae/falcon-40b | ✔ | ✔ | ✔ | | baichuan-inc/Baichuan-13B-Chat | ✔ | ✔ | ✔ | | baichuan-inc/Baichuan2-13B-Chat | ✔ | ✔ | ✔ | | baichuan-inc/Baichuan2-7B-Chat | ✔ | ✔ | ✔ | | bigscience/bloom-1b7 | ✔ | ✔ | ✔ | | databricks/dolly-v2-12b | ✖ | ✔ | ✖ | | EleutherAI/gpt-neox-20b | ✖ | ✔ | ✔ | | mistralai/Mistral-7B-v0.1 | ✖ | ✔ | ✔ | | THUDM/chatglm2-6b | WIP | ✔ | WIP | | THUDM/chatglm3-6b | WIP | ✔ | ✔ | **Detail recipes can be found [HERE](https://github.com/intel/intel-extension-for-transformers/blob/main/examples/huggingface/pytorch/text-generation/quantization/llm_quantization_recipes.html).** > Notes: > > - This model list comes from [IPEX](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/llm.html). > - The WIP recipes will be published soon. ## Large Language Models Accuracy
| Model | lambada_openai | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| FP32 | SQ INT8 | WOQ INT8 | WOQ INT4 GPTQ | WOQ INT4 AutoRound | |||||
| ACC | ACC | Ratio | ACC | Ratio | ACC | Ratio | ACC | Ratio | |
| baichuan-inc/Baichuan-13B-Chat | 67.57% | 67.86% | 1.0043 | 67.55% | 0.9997 | 67.46% | 0.9984 | N/A | N/A |
| baichuan-inc/Baichuan2-13B-Chat | 71.51% | 75.51% | 1.0559 | 71.57% | 1.0008 | 71.45% | 0.9992 | 70.87% | 0.9911 |
| baichuan-inc/Baichuan2-7B-Chat | 67.67% | 67.51% | 0.9976 | 67.61% | 0.9991 | 68.08% | 1.0061 | 67.18% | 0.9928 |
| bigscience/bloom-1b7 | 46.34% | 47.97% | 1.0352 | 46.21% | 0.9972 | 47.00% | 1.0142 | N/A | N/A |
| databricks/dolly-v2-12b | 64.35% | N/A | N/A | 63.92% | 0.9933 | N/A | N/A | N/A | N/A |
| EleutherAI/gpt-j-6b | 68.31% | 68.00% | 0.9955 | 68.27% | 0.9994 | 68.23% | 0.9988 | 67.40% | 0.9867 |
| EleutherAI/gpt-neox-20b | 72.33% | N/A | N/A | 72.29% | 0.9994 | 72.15% | 0.9975 | N/A | N/A |
| facebook/opt-1.3b | 57.89% | 57.35% | 0.9907 | 58.12% | 1.0040 | 58.01% | 1.0021 | N/A | N/A |
| facebook/opt-30b | 71.49% | 71.51% | 1.0003 | 71.53% | 1.0006 | 71.82% | 1.0046 | 71.43% | 0.9992 |
| meta-llama/Llama-2-13b-hf | 76.77% | N/A | N/A | 76.89% | 1.0016 | 76.96% | 1.0025 | N/A | N/A |
| meta-llama/Llama-2-70b-hf | 79.64% | 79.53% | 0.9986 | 79.62% | 0.9997 | 80.05% | 1.0051 | N/A | N/A |
| meta-llama/Llama-2-7b-hf | 73.92% | N/A | N/A | 73.90% | 0.9997 | 73.51% | 0.9945 | N/A | N/A |
| mistralai/Mistral-7B-v0.1 | 75.90% | N/A | N/A | 75.80% | 0.9987 | 75.37% | 0.9930 | 75.82% | 0.9989 |
| THUDM/chatglm2-6b | 53.23% | N/A | N/A | 53.00% | 0.9957 | N/A | N/A | N/A | N/A |
| THUDM/chatglm3-6b | 59.09% | N/A | N/A | 59.03% | 0.9990 | N/A | N/A | 58.59% | 0.9915 |
| tiiuae/falcon-40b | 77.22% | 77.26% | 1.0005 | 77.18% | 0.9995 | 77.97% | 1.0097 | N/A | N/A |
| tiiuae/falcon-7b | 74.67% | 76.17% | 1.0201 | 74.73% | 1.0008 | 74.79% | 1.0016 | N/A | N/A |