neural_compressor.compression.pruner.patterns.mha

MHA patterns.

Module Contents

Classes

PatternMHA

Pruning Pattern.

class neural_compressor.compression.pruner.patterns.mha.PatternMHA(config, modules=None)[source]

Pruning Pattern.

A Pattern class derived from BasePattern. In this pattern, we calculate head masks for a MHA module For more info of this pattern, please refer to : https://github.com/intel/neural-compressor/blob/master/docs/sparsity.md

Parameters:

config – A config dict object that contains the pattern information.

N[source]

The number of elements to be pruned in a weight sequence.

M[source]

The size of the weight sequence.