neural_compressor.compression.pruner.model_slim.weight_slim

Weight Squeezer.

Classes

`PostCompressionUtils`	Operations library related to weight compression.
`LinearCompression`	Class which automatically compresses two consecutive linear layers.
`LinearCompressionIterator`	Pruner of a sequence of consecutive linear patterns.

Module Contents

class neural_compressor.compression.pruner.model_slim.weight_slim.PostCompressionUtils[source]: Operations library related to weight compression.

class neural_compressor.compression.pruner.model_slim.weight_slim.LinearCompression(root_linear, target_linears)[source]

Class which automatically compresses two consecutive linear layers.

For two consecutive linear layer, when the second layer’s input channel is pruned, then the first layer’s output channel can also be pruned, while the second layer’s output hidden state value is identical. for example, two consecutive linears have following structure. x = layer_1(input) x = act_fn(x) x = layer_2(x)

Parameters:

layer_1 (torch.nn.Linear) – the first Linear layer.
layer_2 (torch.nn.Linear) – the second Linear layer.

layer_1[source]

the first Linear layer.

Type:: torch.nn.Linear

layer_2[source]

the second Linear layer.

Type:: torch.nn.Linear

device[source]: the device of layers’ weights.

class neural_compressor.compression.pruner.model_slim.weight_slim.LinearCompressionIterator(linear_patterns)[source]

Pruner of a sequence of consecutive linear patterns.

linear_patterns[source]

a iterable object of consecutive linear patterns.

Type:: dict/list