neural_compressor.compression.pruner.model_slim.weight_slim

Weight Squeezer.

Module Contents

Classes

PostCompressionUtils

Operations library related to weight compression.

LinearCompression

Class which automatically compresses two consecutive linear layers.

LinearCompressionIterator

Pruner of a sequence of consecutive linear patterns.

class neural_compressor.compression.pruner.model_slim.weight_slim.PostCompressionUtils[source]

Operations library related to weight compression.

class neural_compressor.compression.pruner.model_slim.weight_slim.LinearCompression(root_linear, target_linears)[source]

Class which automatically compresses two consecutive linear layers.

For two consecutive linear layer, when the second layer’s input channel is pruned, then the first layer’s output channel can also be pruned, while the second layer’s output hidden state value is identical. for example, two consecutive linears have following structure. x = layer_1(input) x = act_fn(x) x = layer_2(x)

Parameters:
  • layer_1 (torch.nn.Linear) – the first Linear layer.

  • layer_2 (torch.nn.Linear) – the second Linear layer.

layer_1[source]

the first Linear layer.

Type:

torch.nn.Linear

layer_2[source]

the second Linear layer.

Type:

torch.nn.Linear

device[source]

the device of layers’ weights.

class neural_compressor.compression.pruner.model_slim.weight_slim.LinearCompressionIterator(linear_patterns)[source]

Pruner of a sequence of consecutive linear patterns.

linear_patterns[source]

a iterable object of consecutive linear patterns.

Type:

dict/list