neural_compressor.compression.pruner.model_slim.auto_slim

Auto slim.

Functions

model_slim(model[, dataloader, round_multiplier])

Slim the sparse model automatically.

model_slim_ffn2(model[, dataloader, round_multiplier])

Remove some sparse part in the model permanently and obtain acceleration directly.

model_slim_mha(model[, dataloader])

Remove some sparse part in the model permanently and obtain acceleration directly.

parse_auto_slim_config(model[, dataloader, ...])

Get model slim pruning configs.

generate_ffn2_pruning_config(model, dataloader, ...)

Get consecutive linear layers pruning configs.

generate_mha_pruning_config(model, dataloader, ...)

Get multi-head attention layers pruning configs.

Module Contents

neural_compressor.compression.pruner.model_slim.auto_slim.model_slim(model, dataloader=None, round_multiplier=32)[source]

Slim the sparse model automatically.

neural_compressor.compression.pruner.model_slim.auto_slim.model_slim_ffn2(model, dataloader=None, round_multiplier=32)[source]

Remove some sparse part in the model permanently and obtain acceleration directly.

Parameters:
  • model – a sprase model.

  • round_multiplier (int) – the channel number after slimming should be multiple of this number.

neural_compressor.compression.pruner.model_slim.auto_slim.model_slim_mha(model, dataloader=None)[source]

Remove some sparse part in the model permanently and obtain acceleration directly.

Parameters:

model – a sprase model.

neural_compressor.compression.pruner.model_slim.auto_slim.parse_auto_slim_config(model, dataloader=None, ffn2_sparsity=0.0, mha_sparsity=0.0, **kwargs)[source]

Get model slim pruning configs.

neural_compressor.compression.pruner.model_slim.auto_slim.generate_ffn2_pruning_config(model, dataloader, ffn2_sparsity, **kwargs)[source]

Get consecutive linear layers pruning configs.

neural_compressor.compression.pruner.model_slim.auto_slim.generate_mha_pruning_config(model, dataloader, mha_sparsity, **kwargs)[source]

Get multi-head attention layers pruning configs.