neural_compressor.compression.pruner.model_slim.weight_slim
Weight Squeezer.
Classes
Operations library related to weight compression. |
|
Class which automatically compresses two consecutive linear layers. |
|
Pruner of a sequence of consecutive linear patterns. |
Module Contents
- class neural_compressor.compression.pruner.model_slim.weight_slim.PostCompressionUtils[source]
Operations library related to weight compression.
- class neural_compressor.compression.pruner.model_slim.weight_slim.LinearCompression(root_linear, target_linears)[source]
Class which automatically compresses two consecutive linear layers.
For two consecutive linear layer, when the second layer’s input channel is pruned, then the first layer’s output channel can also be pruned, while the second layer’s output hidden state value is identical. for example, two consecutive linears have following structure. x = layer_1(input) x = act_fn(x) x = layer_2(x)
- Parameters:
layer_1 (torch.nn.Linear) – the first Linear layer.
layer_2 (torch.nn.Linear) – the second Linear layer.