neural_compressor.torch.utils.utility

Intel Neural Compressor PyTorch utilities.

Functions

register_algo(name)

Decorator function to register algorithms in the algos_mapping dictionary.

fetch_module(model, op_name)

Get module with a given op name.

set_module(model, op_name, new_module)

Set module with a given op name.

get_model_info(→ List[Tuple[str, str]])

Get model info according to white_module_list.

get_double_quant_config_dict([double_quant_type])

Query config dict of double_quant according to double_quant_type.

get_quantizer(model, quantizer_cls[, quant_config])

Get the quantizer.

postprocess_model(model, mode, quantizer)

Process quantizer attribute of model according to current phase.

dump_model_op_stats(mode, tune_cfg)

Dump quantizable ops stats of model to user.

get_model_device(model)

Get the device.

get_processor_type_from_user_config([user_processor_type])

Get the processor type.

dowload_hf_model(repo_id[, cache_dir, repo_type, revision])

Download hugging face model from hf hub.

load_empty_model(pretrained_model_name_or_path[, cls])

Load a empty model.

get_module(module, key)

Get module from model by key name.

get_layer_names_in_block(model[, supported_types, ...])

Retrieves the names of layers within each block of the model.

to_dtype(input[, dtype])

Moves input data to the specified data type.

to_device(input[, device])

Moves input data to the specified device.

get_block_names(model)

Get the block names for transformers-like networks.

validate_modules(module_names)

Test a list of modules' validity.

get_multimodal_block_names(model[, quant_vision])

Get the multimodal model block names for transformers-like networks.

detect_device([device])

Detects the device to use for model execution (GPU, HPU, or CPU).

run_fn_for_vlm_autoround(model, dataloader[, seqlen, ...])

Runs a model on a provided dataset with automatic device detection for vector-language models.

Module Contents

neural_compressor.torch.utils.utility.register_algo(name)[source]

Decorator function to register algorithms in the algos_mapping dictionary.

Usage example:

@register_algo(name=example_algo) def example_algo(model: torch.nn.Module, quant_config: RTNConfig) -> torch.nn.Module:

Parameters:

name (str) – The name under which the algorithm function will be registered.

Returns:

The decorator function to be used with algorithm functions.

Return type:

decorator

neural_compressor.torch.utils.utility.fetch_module(model, op_name)[source]

Get module with a given op name.

Parameters:
  • model (object) – the input model.

  • op_name (str) – name of op.

Returns:

module (object).

neural_compressor.torch.utils.utility.set_module(model, op_name, new_module)[source]

Set module with a given op name.

Parameters:
  • model (object) – the input model.

  • op_name (str) – name of op.

  • new_module (object) – the input model.

Returns:

module (object).

neural_compressor.torch.utils.utility.get_model_info(model: torch.nn.Module, white_module_list: List[Callable]) List[Tuple[str, str]][source]

Get model info according to white_module_list.

neural_compressor.torch.utils.utility.get_double_quant_config_dict(double_quant_type='BNB_NF4')[source]

Query config dict of double_quant according to double_quant_type.

Parameters:

double_quant_type (str, optional) – double_quant type. Defaults to “BNB_NF4”.

neural_compressor.torch.utils.utility.get_quantizer(model, quantizer_cls, quant_config=None, *args, **kwargs)[source]

Get the quantizer.

Initialize a quantizer or get quantizer attribute from model.

Parameters:
  • model (torch.nn.Module) – pytorch model.

  • quantizer_cls (Quantizer) – quantizer class of a specific algorithm.

  • quant_config (dict, optional) – Specifies how to apply the algorithm on the given model. Defaults to None.

Returns:

quantizer object.

neural_compressor.torch.utils.utility.postprocess_model(model, mode, quantizer)[source]

Process quantizer attribute of model according to current phase.

In prepare phase, the quantizer is set as an attribute of the model to avoid redundant initialization during convert phase.

In ‘convert’ or ‘quantize’ phase, the unused quantizer attribute is removed.

Parameters:
  • model (torch.nn.Module) – pytorch model.

  • mode (Mode) – The mode of current phase, including ‘prepare’, ‘convert’ and ‘quantize’.

  • quantizer (Quantizer) – quantizer object.

neural_compressor.torch.utils.utility.dump_model_op_stats(mode, tune_cfg)[source]

Dump quantizable ops stats of model to user.

Parameters:
  • mode (object) – quantization mode.

  • tune_cfg (dict) – quantization config

neural_compressor.torch.utils.utility.get_model_device(model: torch.nn.Module)[source]

Get the device.

Parameters:

model (torch.nn.Module) – the input model.

Returns:

a string.

Return type:

device (str)

neural_compressor.torch.utils.utility.get_processor_type_from_user_config(user_processor_type: str | neural_compressor.common.utils.ProcessorType | None = None)[source]

Get the processor type.

Get the processor type based on the user configuration or automatically detect it based on the hardware.

Parameters:

user_processor_type (Optional[Union[str, ProcessorType]]) – The user-specified processor type. Defaults to None.

Returns:

The detected or user-specified processor type.

Return type:

ProcessorType

Raises:
  • AssertionError – If the user-specified processor type is not supported.

  • NotImplementedError – If the processor type is not recognized.

neural_compressor.torch.utils.utility.dowload_hf_model(repo_id, cache_dir=None, repo_type=None, revision=None)[source]

Download hugging face model from hf hub.

neural_compressor.torch.utils.utility.load_empty_model(pretrained_model_name_or_path, cls=None, **kwargs)[source]

Load a empty model.

neural_compressor.torch.utils.utility.get_module(module, key)[source]

Get module from model by key name.

Parameters:
  • module (torch.nn.Module) – original model

  • key (str) – module name to be replaced

neural_compressor.torch.utils.utility.get_layer_names_in_block(model, supported_types=SUPPORTED_LAYERS, quant_block_list=None)[source]

Retrieves the names of layers within each block of the model.

Returns:

A list of strings, where each string is the name of a layer

within a block of the model.

Return type:

list

neural_compressor.torch.utils.utility.to_dtype(input, dtype=torch.float32)[source]

Moves input data to the specified data type.

Args: input: The input data to be moved. dtype: The target data type.

Returns: The input data on the specified data type.

neural_compressor.torch.utils.utility.to_device(input, device=torch.device('cpu'))[source]

Moves input data to the specified device.

Args: input: The input data to be moved. device: The target device.

Returns: The input data on the specified device.

neural_compressor.torch.utils.utility.get_block_names(model)[source]

Get the block names for transformers-like networks.

Args: model: The model.

Returns: block_names: A list whose elements are list of block’s layer names

neural_compressor.torch.utils.utility.validate_modules(module_names)[source]

Test a list of modules’ validity.

Args: modules (list of str): List of strings to be validated.

Returns: bool: True if all modules have equal length or not dependent, otherwise False.

neural_compressor.torch.utils.utility.get_multimodal_block_names(model, quant_vision=False)[source]

Get the multimodal model block names for transformers-like networks.

Args: model: The model.

Returns: block_names: A list whose elements are list of block’s layer names

neural_compressor.torch.utils.utility.detect_device(device=None)[source]

Detects the device to use for model execution (GPU, HPU, or CPU).

Parameters:

device (str, int, torch.device, optional) –

  • If a string (‘cuda’, ‘cpu’, or ‘hpu’) or torch.device is provided, that device is selected.

  • If an integer is provided, it treats it as a GPU device index.

  • If None or ‘auto’, it automatically selects ‘cuda’ if available, ‘hpu’ if Habana is available, or falls back to ‘cpu’.

Returns:

The selected device in string format (‘cuda:X’, ‘hpu’, or ‘cpu’).

Return type:

str

neural_compressor.torch.utils.utility.run_fn_for_vlm_autoround(model, dataloader, seqlen=512, nsamples=512)[source]

Runs a model on a provided dataset with automatic device detection for vector-language models.

Parameters:
  • model – The model to run.

  • dataloader – A PyTorch dataloader providing the input data for the model.

  • seqlen (int, optional) – The minimum sequence length of input data to process. Defaults to 512.

  • nsamples (int, optional) – The number of samples to process before stopping. Defaults to 512.

Returns:

None