neural_compressor.compression.pruner.pruners

Pruner.

Module Contents

Classes

BasePruner

Pruning Pruner.

BasicPruner

Pruning Pruner.

PatternLockPruner

Pruning Pruner.

BlockMaskPruner

Pruning Pruner.

RetrainFreePruner

Pruning Pruner.

ProgressivePruner

Pruning Pruner.

MultiheadAttentionPruner

Pruning Pruner.

Functions

register_pruner(name)

Class decorator to register a Pruner subclass to the registry.

parse_valid_pruner_types()

Get all valid pruner names.

get_pruner(config, modules[, framework])

Get registered pruner class.

neural_compressor.compression.pruner.pruners.register_pruner(name)[source]

Class decorator to register a Pruner subclass to the registry.

Decorator function used before a Pattern subclass. Make sure that the Pruner class decorated by this function can be registered in PRUNERS.

Parameters:
  • cls (class) – The subclass of register.

  • name – A string. Define the pruner type.

Returns:

The class of register.

Return type:

cls

neural_compressor.compression.pruner.pruners.parse_valid_pruner_types()[source]

Get all valid pruner names.

neural_compressor.compression.pruner.pruners.get_pruner(config, modules, framework='pytorch')[source]

Get registered pruner class.

Get a Pruner object from PRUNERS.

Parameters:
  • modules – A dict {“module_name”: Tensor} that stores the pruning modules’ weights.

  • config – A config dict object that contains the pruner information.

Returns:

A Pruner object.

Raises: AssertionError: Cuurently only support pruners that have been registered in PRUNERS.

class neural_compressor.compression.pruner.pruners.BasePruner(config, modules, framework='pytorch')[source]

Pruning Pruner.

The class which executes pruning process.

Parameters:
  • modules – A dict {“module_name”: Tensor} that stores the pruning modules’ weights.

  • config – A config dict object that contains the pruner information.

modules[source]

A dict {“module_name”: Tensor} that stores the pruning modules’ weights.

config[source]

A config dict object that contains the pruner information.

masks[source]

A dict {“module_name”: Tensor} that stores the masks for modules’ weights.

scores[source]

A dict {“module_name”: Tensor} that stores the score for modules’ weights, which are used to determine what parts to be pruned by a criterion.

pattern[source]

A Pattern object defined in ./patterns.py

scheduler[source]

A scheduler object defined in ./scheduler.py

current_sparsity_ratio[source]

A float representing the current model’s sparsity ratio; it is initialized to be zero.

global_step[source]

An integer representing the total steps the model has run.

start_step[source]

An integer representing when to trigger pruning process.

end_step[source]

An integer representing when to end pruning process.

pruning_frequency[source]

An integer representing the pruning frequency; it is valid when iterative pruning is enabled.

target_sparsity_ratio[source]

A float showing the final sparsity after pruning.

max_sparsity_ratio_per_op[source]

A float showing the maximum sparsity ratio for every module.

class neural_compressor.compression.pruner.pruners.BasicPruner(config, modules, framework='pytorch')[source]

Pruning Pruner.

The class which executes pruning process. 1. Defines pruning functions called at step begin/end, epoch begin/end. 2. Defines the pruning criterion.

Parameters:
  • modules – A dict {“module_name”: Tensor} that stores the pruning modules’ weights.

  • config – A config dict object that contains the pruner information.

pattern[source]

A Pattern object that defines pruning weights’ arrangements within space.

criterion[source]

A Criterion Object that defines which weights are to be pruned

scheduler[source]

A Scheduler object that defines how the model’s sparsity changes as training/pruning proceeds.

reg[source]

A Reg object that defines regulization terms.

class neural_compressor.compression.pruner.pruners.PatternLockPruner(config, modules, framework='pytorch')[source]

Pruning Pruner.

A Pruner class derived from BasePruner. In this pruner, original model’s sparsity pattern will be fixed while training. This pruner is useful when a user trains a sparse model without changing its original structure.

Parameters:
  • modules – A dict {“module_name”: Tensor} that stores the pruning modules’ weights.

  • config – A config dict object that contains the pruner information.

Inherit from parent class Pruner.
class neural_compressor.compression.pruner.pruners.BlockMaskPruner(config, modules, framework='pytorch')[source]

Pruning Pruner.

The class which executes pruning process. 1. Defines pruning functions called at step begin/end, before/after optimize and epoch begin/end. 2. Defines the pruning criterion. 3. Obtain block masks and its grads.

Parameters:
  • modules – A dict {“module_name”: Tensor} that stores the pruning modules’ weights.

  • config – A config dict object that contains the pruner information.

pattern[source]

A Pattern object that defines pruning weights’ arrangements within space.

criterion[source]

A Criterion Object that defines which weights are to be pruned

scheduler[source]

A Scheduler object that defines how the model’s sparsity changes as training/pruning proceeds.

reg[source]

A Reg object that defines regulization terms.

class neural_compressor.compression.pruner.pruners.RetrainFreePruner(config, modules, framework='pytorch')[source]

Pruning Pruner. The retrain_free pruner_class is derived from BasePruner. This pruner references the mask search and mask rearrangement strategies in fast retraining free. RetrainFreePruner supports one-shot pruning (same effect as fast retraining free) and iterative pruning. Please refer to A Fast Post-Training Pruning Framework for Transformers

  1. Defines pruning functions called at step begin/end, before/after optimize and epoch begin/end.

  2. Defines the pruning criterion and fixed weight parameters.

  3. Obtain block masks and its grads.

  4. Rearrange block masks.

Parameters:
  • modules – A dict {“module_name”: Tensor} that stores the pruning modules’ weights.

  • config – A config dict object that contains the pruner information.

pattern[source]

A Pattern object that defines pruning weights’ arrangements within space.

criterion[source]

A Criterion Object that defines which weights are to be pruned

scheduler[source]

A Scheduler object that defines how the model’s sparsity changes as training/pruning proceeds.

reg[source]

A Reg object that defines regulization terms.

class neural_compressor.compression.pruner.pruners.ProgressivePruner(config, modules, framework='pytorch')[source]

Pruning Pruner.

A Pruner class derived from BasicPruner. In this pruner, mask interpolation will be applied. Mask interpolation is a fine-grained improvement for NxM structured pruning by adding interval

masks between masks of two pruning steps.

Parameters:
  • modules – A dict {“module_name”: Tensor} that stores the pruning modules’ weights.

  • config – A config dict object that contains the pruner information.

Inherit from parent class Pruner.
class neural_compressor.compression.pruner.pruners.MultiheadAttentionPruner(config, mha_modules)[source]

Pruning Pruner.

In this pruner, We apply pruning for multi-head attentions. multi-head attention pruning means remove partial QKV layers and their corresponding feedward layers simultaneously.

Parameters:
  • mha_modules – A List

  • [

    {

    ‘qkv_name’: [‘query_layer_name’, ‘key_layer_name’, ‘value_layer_name’], ‘ffn_name’: [‘attention_ffn_name’], ‘mha_name’: [‘mha_name’] (keep not change), ‘qkv_module’: [torch.nn.Linear, torch.nn.Linear, torch.nn.Linear], ‘ffn_module’: [torch.nn.Linear], ‘mha_module’: [torch.nn.Module] (keep not change),

  • ]

  • modules. (that stores the pruning mha) –

  • config – A config dict object that contains the pruner information.

mha_compressions[source]

a Dict. (key: MHA module name; value: MHACompression object in .model_slim.weight_slim) Main object to hook critical attributes for mha pruning and modify these attributes.

linear_layers[source]

a Dict. {key: linear layer name; value: torch.nn.Linear object.} Store independent linear layer look-up table, which used by criterion object. linear_layers length should be 4x of mha_compression because one mha_compression hooks 4 linear layers: query, key, value and subsequent ffn layer.

head_masks[source]

A dict. {key: MHA module name; value: torch.Tensor(1, mha_head_size)} Similar to Huggingface build-in head_mask attribute.

mha_scores[source]

A dict. {key: MHA module name; value: torch.Tensor(1, mha_head_size)} Store scores for different heads.