neural_compressor.torch.utils.utility
Intel Neural Compressor PyTorch utilities.
Functions
|
Decorator function to register algorithms in the algos_mapping dictionary. |
|
Get module with a given op name. |
|
Set module with a given op name. |
|
Get model info according to white_module_list. |
|
Query config dict of double_quant according to double_quant_type. |
|
Get the quantizer. |
|
Process quantizer attribute of model according to current phase. |
|
Dump quantizable ops stats of model to user. |
|
Get the device. |
|
Get the processor type. |
|
Download hugging face model from hf hub. |
|
Load a empty model. |
|
Get module from model by key name. |
|
Retrieves the names of layers within each block of the model. |
|
Moves input data to the specified data type. |
|
Moves input data to the specified device. |
|
Get the block names for transformers-like networks. |
|
Test a list of modules' validity. |
|
Get the multimodal model block names for transformers-like networks. |
|
Detects the device to use for model execution (GPU, HPU, or CPU). |
|
Runs a model on a provided dataset with automatic device detection for vector-language models. |
Module Contents
- neural_compressor.torch.utils.utility.register_algo(name)[source]
Decorator function to register algorithms in the algos_mapping dictionary.
- Usage example:
@register_algo(name=example_algo) def example_algo(model: torch.nn.Module, quant_config: RTNConfig) -> torch.nn.Module:
…
- Parameters:
name (str) – The name under which the algorithm function will be registered.
- Returns:
The decorator function to be used with algorithm functions.
- Return type:
decorator
- neural_compressor.torch.utils.utility.fetch_module(model, op_name)[source]
Get module with a given op name.
- Parameters:
model (object) – the input model.
op_name (str) – name of op.
- Returns:
module (object).
- neural_compressor.torch.utils.utility.set_module(model, op_name, new_module)[source]
Set module with a given op name.
- Parameters:
model (object) – the input model.
op_name (str) – name of op.
new_module (object) – the input model.
- Returns:
module (object).
- neural_compressor.torch.utils.utility.get_model_info(model: torch.nn.Module, white_module_list: List[Callable]) List[Tuple[str, str]] [source]
Get model info according to white_module_list.
- neural_compressor.torch.utils.utility.get_double_quant_config_dict(double_quant_type='BNB_NF4')[source]
Query config dict of double_quant according to double_quant_type.
- Parameters:
double_quant_type (str, optional) – double_quant type. Defaults to “BNB_NF4”.
- neural_compressor.torch.utils.utility.get_quantizer(model, quantizer_cls, quant_config=None, *args, **kwargs)[source]
Get the quantizer.
Initialize a quantizer or get quantizer attribute from model.
- Parameters:
model (torch.nn.Module) – pytorch model.
quantizer_cls (Quantizer) – quantizer class of a specific algorithm.
quant_config (dict, optional) – Specifies how to apply the algorithm on the given model. Defaults to None.
- Returns:
quantizer object.
- neural_compressor.torch.utils.utility.postprocess_model(model, mode, quantizer)[source]
Process quantizer attribute of model according to current phase.
In prepare phase, the quantizer is set as an attribute of the model to avoid redundant initialization during convert phase.
In ‘convert’ or ‘quantize’ phase, the unused quantizer attribute is removed.
- neural_compressor.torch.utils.utility.dump_model_op_stats(mode, tune_cfg)[source]
Dump quantizable ops stats of model to user.
- Parameters:
mode (object) – quantization mode.
tune_cfg (dict) – quantization config
- neural_compressor.torch.utils.utility.get_model_device(model: torch.nn.Module)[source]
Get the device.
- Parameters:
model (torch.nn.Module) – the input model.
- Returns:
a string.
- Return type:
device (str)
- neural_compressor.torch.utils.utility.get_processor_type_from_user_config(user_processor_type: str | neural_compressor.common.utils.ProcessorType | None = None)[source]
Get the processor type.
Get the processor type based on the user configuration or automatically detect it based on the hardware.
- Parameters:
user_processor_type (Optional[Union[str, ProcessorType]]) – The user-specified processor type. Defaults to None.
- Returns:
The detected or user-specified processor type.
- Return type:
- Raises:
AssertionError – If the user-specified processor type is not supported.
NotImplementedError – If the processor type is not recognized.
- neural_compressor.torch.utils.utility.dowload_hf_model(repo_id, cache_dir=None, repo_type=None, revision=None)[source]
Download hugging face model from hf hub.
- neural_compressor.torch.utils.utility.load_empty_model(pretrained_model_name_or_path, cls=None, **kwargs)[source]
Load a empty model.
- neural_compressor.torch.utils.utility.get_module(module, key)[source]
Get module from model by key name.
- Parameters:
module (torch.nn.Module) – original model
key (str) – module name to be replaced
- neural_compressor.torch.utils.utility.get_layer_names_in_block(model, supported_types=SUPPORTED_LAYERS, quant_block_list=None)[source]
Retrieves the names of layers within each block of the model.
- Returns:
- A list of strings, where each string is the name of a layer
within a block of the model.
- Return type:
list
- neural_compressor.torch.utils.utility.to_dtype(input, dtype=torch.float32)[source]
Moves input data to the specified data type.
Args: input: The input data to be moved. dtype: The target data type.
Returns: The input data on the specified data type.
- neural_compressor.torch.utils.utility.to_device(input, device=torch.device('cpu'))[source]
Moves input data to the specified device.
Args: input: The input data to be moved. device: The target device.
Returns: The input data on the specified device.
- neural_compressor.torch.utils.utility.get_block_names(model)[source]
Get the block names for transformers-like networks.
Args: model: The model.
Returns: block_names: A list whose elements are list of block’s layer names
- neural_compressor.torch.utils.utility.validate_modules(module_names)[source]
Test a list of modules’ validity.
Args: modules (list of str): List of strings to be validated.
Returns: bool: True if all modules have equal length or not dependent, otherwise False.
- neural_compressor.torch.utils.utility.get_multimodal_block_names(model, quant_vision=False)[source]
Get the multimodal model block names for transformers-like networks.
Args: model: The model.
Returns: block_names: A list whose elements are list of block’s layer names
- neural_compressor.torch.utils.utility.detect_device(device=None)[source]
Detects the device to use for model execution (GPU, HPU, or CPU).
- Parameters:
device (str, int, torch.device, optional) –
If a string (‘cuda’, ‘cpu’, or ‘hpu’) or torch.device is provided, that device is selected.
If an integer is provided, it treats it as a GPU device index.
If None or ‘auto’, it automatically selects ‘cuda’ if available, ‘hpu’ if Habana is available, or falls back to ‘cpu’.
- Returns:
The selected device in string format (‘cuda:X’, ‘hpu’, or ‘cpu’).
- Return type:
str
- neural_compressor.torch.utils.utility.run_fn_for_vlm_autoround(model, dataloader, seqlen=512, nsamples=512)[source]
Runs a model on a provided dataset with automatic device detection for vector-language models.
- Parameters:
model – The model to run.
dataloader – A PyTorch dataloader providing the input data for the model.
seqlen (int, optional) – The minimum sequence length of input data to process. Defaults to 512.
nsamples (int, optional) – The number of samples to process before stopping. Defaults to 512.
- Returns:
None