neural_compressor.compression.pruner.patterns
pruning patterns.
Module Contents
Classes
Pruning Pattern. |
|
Pruning Pattern. |
|
Pruning Pattern. |
Functions
|
Class decorator used to register a Pattern subclass to the registry. |
|
Get registered pattern class. |
- neural_compressor.compression.pruner.patterns.register_pattern(name)[source]
Class decorator used to register a Pattern subclass to the registry.
Decorator function used before a Pattern subclasses. Make sure that this Pattern class can be registered in PATTERNS.
- Parameters:
name – A string defining the pattern type name to be used in a pruning process.
- Returns:
The class of register.
- Return type:
cls
- neural_compressor.compression.pruner.patterns.get_pattern(config, modules)[source]
Get registered pattern class.
Get a Pattern object from PATTERNS.
- Parameters:
config – A config dict object that contains the pattern information.
modules – Torch neural network modules to be pruned with the pattern.
- Returns:
A Pattern object.
- Raises:
AssertionError – Currently only support patterns which have been registered in PATTERNS.
- class neural_compressor.compression.pruner.patterns.BasePattern(config, modules)[source]
Pruning Pattern.
It defines the basic pruning unit and how this unit will be pruned during pruning, e.g. 4x1, 2:4.
- Parameters:
config – A config dict object that contains the pattern information.
modules – Torch neural network modules to be pruned with the pattern.
- is_global[source]
A bool determining whether the pruning takes global pruning option. Global pruning means that pruning scores by a pruning criterion are evaluated in all layers. Local pruning, by contrast, means that pruning scores by the pruning criterion are evaluated
in every layer individually.
- max_sparsity_ratio_per_op[source]
A float representing the maximum sparsity that one layer could reach.
- class neural_compressor.compression.pruner.patterns.PatternNxM(config, modules)[source]
Pruning Pattern.
A Pattern class derived from BasePattern. In this pattern, the weights in a NxM block will be pruned or kept during one pruning step.
- Parameters:
config – A config dict object that contains the pattern information.
- Please note that the vertical direction of a Linear layer's weight refers to the output channel.
because PyTorch’s tensor matmul has a hidden transpose operation.
- class neural_compressor.compression.pruner.patterns.PatternNInM(config, modules)[source]
Pruning Pattern.
A Pattern class derived from Pattern. In this pattern, N out of every M continuous weights will be pruned. For more info of this pattern, please refer to : https://github.com/intel/neural-compressor/blob/master/docs/sparsity.md
- Parameters:
config – A config dict object that contains the pattern information.