neural_compressor.torch.utils.utility

Intel Neural Compressor PyTorch utilities.

Functions

`register_algo`(name)	Decorator function to register algorithms in the algos_mapping dictionary.
`fetch_module`(model, op_name)	Get module with a given op name.
`set_module`(model, op_name, new_module)	Set module with a given op name.
`get_model_info`(→ List[Tuple[str, str]])	Get model info according to white_module_list.
`get_double_quant_config_dict`([double_quant_type])	Query config dict of double_quant according to double_quant_type.
`get_quantizer`(model, quantizer_cls[, quant_config])	Get the quantizer.
`postprocess_model`(model, mode, quantizer)	Process quantizer attribute of model according to current phase.
`dump_model_op_stats`(mode, tune_cfg)	Dump quantizable ops stats of model to user.
`get_model_device`(model)	Get the device.
`get_processor_type_from_user_config`([user_processor_type])	Get the processor type.
`dowload_hf_model`(repo_id[, cache_dir, repo_type, revision])	Download hugging face model from hf hub.
`load_empty_model`(pretrained_model_name_or_path[, cls])	Load a empty model.
`get_module`(module, key)	Get module from model by key name.
`get_layer_names_in_block`(model[, supported_types, ...])	Retrieves the names of layers within each block of the model.
`to_dtype`(input[, dtype])	Moves input data to the specified data type.
`to_device`(input[, device])	Moves input data to the specified device.
`get_block_names`(model)	Get the block names for transformers-like networks.
`validate_modules`(module_names)	Test a list of modules' validity.
`get_multimodal_block_names`(model[, quant_vision])	Get the multimodal model block names for transformers-like networks.
`detect_device`([device])	Detects the device to use for model execution (GPU, HPU, or CPU).
`find_matching_blocks`(model, all_blocks[, ...])	Find and return matching blocks in the model based on to_quant_block_names.
`get_non_persistent_buffers`(model)	Get all non-persistent buffers in the model.
`load_non_persistent_buffers`(model, non_persistent_buffers)	Load all non-persistent buffers into the model.
`move_input_device`(input[, device])	Auto mapping input to device for all kinds of format.
`forward_wrapper`(model, input)	Model forward with device auto mapping.

Module Contents

neural_compressor.torch.utils.utility.register_algo(name)[source]

Decorator function to register algorithms in the algos_mapping dictionary.

Usage example:: @register_algo(name=example_algo) def example_algo(model: torch.nn.Module, quant_config: RTNConfig) -> torch.nn.Module:

…

Parameters:: name (str) – The name under which the algorithm function will be registered.
Returns:: The decorator function to be used with algorithm functions.
Return type:: decorator

neural_compressor.torch.utils.utility.fetch_module(model, op_name)[source]

Get module with a given op name.

Parameters:

model (object) – the input model.
op_name (str) – name of op.

Returns:

module (object).

neural_compressor.torch.utils.utility.set_module(model, op_name, new_module)[source]

Set module with a given op name.

Parameters:

model (object) – the input model.
op_name (str) – name of op.
new_module (object) – the input model.

Returns:

module (object).

neural_compressor.torch.utils.utility.get_model_info(model: torch.nn.Module, white_module_list: List[Callable]) → List[Tuple[str, str]][source]: Get model info according to white_module_list.

neural_compressor.torch.utils.utility.get_double_quant_config_dict(double_quant_type='BNB_NF4')[source]

Query config dict of double_quant according to double_quant_type.

Parameters:: double_quant_type (str, optional) – double_quant type. Defaults to “BNB_NF4”.

neural_compressor.torch.utils.utility.get_quantizer(model, quantizer_cls, quant_config=None, *args, **kwargs)[source]

Get the quantizer.

Initialize a quantizer or get quantizer attribute from model.

Parameters:

model (torch.nn.Module) – pytorch model.
quantizer_cls (Quantizer) – quantizer class of a specific algorithm.
quant_config (dict, optional) – Specifies how to apply the algorithm on the given model. Defaults to None.

Returns:

quantizer object.

neural_compressor.torch.utils.utility.postprocess_model(model, mode, quantizer)[source]

Process quantizer attribute of model according to current phase.

In prepare phase, the quantizer is set as an attribute of the model to avoid redundant initialization during convert phase.

In ‘convert’ or ‘quantize’ phase, the unused quantizer attribute is removed.

Parameters:

model (torch.nn.Module) – pytorch model.
mode (Mode) – The mode of current phase, including ‘prepare’, ‘convert’ and ‘quantize’.
quantizer (Quantizer) – quantizer object.

neural_compressor.torch.utils.utility.dump_model_op_stats(mode, tune_cfg)[source]

Dump quantizable ops stats of model to user.

Parameters:

mode (object) – quantization mode.
tune_cfg (dict) – quantization config

neural_compressor.torch.utils.utility.get_model_device(model: torch.nn.Module)[source]

Get the device.

Parameters:: model (torch.nn.Module) – the input model.
Returns:: a string.
Return type:: device (str)

neural_compressor.torch.utils.utility.get_processor_type_from_user_config(user_processor_type: str | neural_compressor.common.utils.ProcessorType | None = None)[source]

Get the processor type.

Get the processor type based on the user configuration or automatically detect it based on the hardware.

Parameters:

user_processor_type (Optional[Union[str, ProcessorType]]) – The user-specified processor type. Defaults to None.

Returns:

The detected or user-specified processor type.

Return type:

ProcessorType

Raises:

AssertionError – If the user-specified processor type is not supported.
NotImplementedError – If the processor type is not recognized.

neural_compressor.torch.utils.utility.dowload_hf_model(repo_id, cache_dir=None, repo_type=None, revision=None)[source]: Download hugging face model from hf hub.

neural_compressor.torch.utils.utility.load_empty_model(pretrained_model_name_or_path, cls=None, **kwargs)[source]: Load a empty model.

neural_compressor.torch.utils.utility.get_module(module, key)[source]

Get module from model by key name.

Parameters:

module (torch.nn.Module) – original model
key (str) – module name to be replaced

neural_compressor.torch.utils.utility.get_layer_names_in_block(model, supported_types=SUPPORTED_LAYERS, to_quant_block_names=None)[source]

Retrieves the names of layers within each block of the model.

Returns:

A list of strings, where each string is the name of a layer: within a block of the model.

Return type:

list

neural_compressor.torch.utils.utility.to_dtype(input, dtype=torch.float32)[source]

Moves input data to the specified data type.

Args: input: The input data to be moved. dtype: The target data type.

Returns: The input data on the specified data type.

neural_compressor.torch.utils.utility.to_device(input, device=torch.device('cpu'))[source]

Moves input data to the specified device.

Args: input: The input data to be moved. device: The target device.

Returns: The input data on the specified device.

neural_compressor.torch.utils.utility.get_block_names(model)[source]

Get the block names for transformers-like networks.

Args: model: The model.

Returns: block_names: A list whose elements are list of block’s layer names

neural_compressor.torch.utils.utility.validate_modules(module_names)[source]

Test a list of modules’ validity.

Args: modules (list of str): List of strings to be validated.

Returns: bool: True if all modules have equal length or not dependent, otherwise False.

neural_compressor.torch.utils.utility.get_multimodal_block_names(model, quant_vision=False)[source]

Get the multimodal model block names for transformers-like networks.

Args: model: The model.

Returns: block_names: A list whose elements are list of block’s layer names

neural_compressor.torch.utils.utility.detect_device(device=None)[source]

Detects the device to use for model execution (GPU, HPU, or CPU).

Parameters:

device (str, int, torch.device, optional) –

If a string (‘cuda’, ‘cpu’, or ‘hpu’) or torch.device is provided, that device is selected.
If an integer is provided, it treats it as a GPU device index.
If None or ‘auto’, it automatically selects ‘cuda’ if available, ‘hpu’ if Habana is available, or falls back to ‘cpu’.

Returns:

The selected device in string format (‘cuda:X’, ‘hpu’, or ‘cpu’).

Return type:

str

neural_compressor.torch.utils.utility.find_matching_blocks(model, all_blocks, to_quant_block_names=None)[source]

Find and return matching blocks in the model based on to_quant_block_names.

Parameters:

model – The model (not used in this specific function but kept for completeness).
all_blocks – List of lists, where each inner list contains full block names in the model.
to_quant_block_names – Comma-separated string of target block names to match.

Returns:

List of lists containing full paths of matching blocks in the model.

Return type:

target_blocks

neural_compressor.torch.utils.utility.get_non_persistent_buffers(model)[source]

Get all non-persistent buffers in the model.

Parameters:: model (torch.nn.Module) – PyTorch model
Returns:: A dictionary containing all non-persistent buffers, {buffer_names: buffer_tensors}
Return type:: dict

neural_compressor.torch.utils.utility.load_non_persistent_buffers(model, non_persistent_buffers)[source]

Load all non-persistent buffers into the model.

Parameters:

model (torch.nn.Module) – PyTorch model
non_persistent_buffers (dict) – A dictionary containing all non-persistent buffers, {buffer_names: buffer_tensors}

neural_compressor.torch.utils.utility.move_input_device(input, device='cpu')[source]

Auto mapping input to device for all kinds of format.

Parameters:

input (torch.tensor) – input data
device (str, optional) – target device. Defaults to “cpu”.

Returns:

input data on target device

Return type:

input (torch.tensor)

neural_compressor.torch.utils.utility.forward_wrapper(model, input)[source]

Model forward with device auto mapping.

Parameters:

model (torch.nn.Module) – input model
input (torch.tensor) – input data

Returns:

output data

Return type:

output