neural_compressor.torch.quantization.save_load_entry
Intel Neural Compressor PyTorch load entry for all algorithms.
Functions
|
Save quantized model. |
|
Load quantized model. |
Module Contents
- neural_compressor.torch.quantization.save_load_entry.save(model, checkpoint_dir='saved_results', format='default')[source]
Save quantized model.
- Parameters:
model (torch.nn.module or TorchScript model with IPEX or fx graph with pt2e, optional) – Quantized model.
checkpoint_dir (str, optional) – checkpoint directory. Defaults to “saved_results”.
format (str, optional) – ‘defult’ for loading INC quantized model. ‘huggingface’ for loading huggingface WOQ causal language model. Defaults to “default”.
- neural_compressor.torch.quantization.save_load_entry.load(model_name_or_path, original_model=None, format='default', device='cpu', **kwargs)[source]
Load quantized model.
- Load INC quantized model in local.
- case 1: WOQ
from neural_compressor.torch.quantization import load load(model_name_or_path=”saved_results”, original_model=fp32_model)
- case 2: INT8/FP8
from neural_compressor.torch.quantization import load load(model_name_or_path=’saved_result’, original_model=fp32_model)
- case 3: TorchScript (IPEX)
from neural_compressor.torch.quantization import load load(model_name_or_path=’saved_result’)
- Load HuggingFace quantized model, including GPTQ models and upstreamed INC quantized models in HF model hub.
- case 1: WOQ
from neural_compressor.torch.quantization import load load(model_name_or_path=model_name_or_path, format=”huggingface”)
- Parameters:
model_name_or_path (str) – torch checkpoint directory or hugginface model_name_or_path. If ‘format’ is set to ‘huggingface’, it means the huggingface model_name_or_path. If ‘format’ is set to ‘default’, it means the ‘checkpoint_dir’. Parameter should not be None. it coworks with ‘original_model’ parameter to load INC quantized model in local.
original_model (torch.nn.module or TorchScript model with IPEX or fx graph with pt2e, optional) – original model before quantization. Needed if ‘format’ is set to ‘default’ and not TorchScript model. Defaults to None.
format (str, optional) – ‘defult’ for loading INC quantized model. ‘huggingface’ for loading huggingface WOQ causal language model. Defaults to “default”.
device (str, optional) – ‘cpu’, ‘hpu’. specify the device the model will be loaded to. currently only used for weight-only quantization.
kwargs (remaining dictionary of keyword arguments, optional) – remaining dictionary of keyword arguments for loading huggingface models. Will be passed to the huggingface model’s __init__ method, such as ‘trust_remote_code’, ‘revision’.
- Returns:
The quantized model