neural_compressor.experimental

Intel® Neural Compressor: An open-source Python library supporting popular model compression techniques.

Subpackages

Submodules

Package Contents

Classes

Component

This is base class of Neural Compressor Component.

Quantization

This class provides easy use API for quantization.

Pruning

This is base class of pruning object.

Benchmark

Benchmark class is used to evaluate the model performance with the objective settings.

Graph_Optimization

Graph_Optimization class.

MixedPrecision

Class used for generating low precision model.

ModelConversion

ModelConversion class is used to convert one model format to another.

Distillation

Distillation class derived from Component class.

NAS

Create object of different NAS approaches.

Attributes

GraphOptimization

class neural_compressor.experimental.Component(conf_fname_or_obj=None, combination=None)

Bases: object

This is base class of Neural Compressor Component.

This class will be inherited by the class ‘Quantization’, ‘Pruning’ and ‘Distillation’. This design is mainly for one-shot optimization for pruning/distillation/quantization-aware training. In this class will apply all hooks for ‘Quantization’, ‘Pruning’ and ‘Distillation’.

property train_func

Not support get train_func.

property eval_func

Not support get eval_func.

property train_dataloader

Getter to train dataloader.

property eval_dataloader

Getter to eval dataloader.

property model

Getter of model in neural_compressor.model.

prepare()

Register Quantization Aware Training hooks.

prepare_qat()

Register Quantization Aware Training hooks.

pre_process()

Initialize some attributes, such as the adaptor, the dataloader and train/eval functions from yaml config.

Component base class provides default function to initialize dataloaders and functions from user config. And for derived classes(Pruning, Quantization, etc.), an override function is required.

execute()

Execute the processing of this compressor.

Component base class provides compressing processing. And for derived classes(Pruning, Quantization, etc.), an override function is required.

post_process()

Post process after execution.

For derived classes(Pruning, Quantization, etc.), an override function is required.

on_train_begin(dataloader=None)

Be called before the beginning of epochs.

on_train_end()

Be called after the end of epochs.

pre_epoch_begin(dataloader=None)

Be called before the beginning of epochs.

post_epoch_end()

Be called after the end of epochs.

on_epoch_begin(epoch)

Be called on the beginning of epochs.

on_step_begin(batch_id)

Be called on the beginning of batches.

on_batch_begin(batch_id)

Be called on the beginning of batches.

on_after_compute_loss(input, student_output, student_loss, teacher_output=None)

Be called on the end of loss computation.

on_before_optimizer_step()

Be called before optimizer step.

on_after_optimizer_step()

Be called after optimizer step.

on_before_eval()

Be called before evaluation.

on_after_eval()

Be called after evaluation.

on_post_grad()

Be called before optimizer step.

on_step_end()

Be called on the end of batches.

on_batch_end()

Be called on the end of batches.

on_epoch_end()

Be called on the end of epochs.

register_hook(scope, hook, input_args=None, input_kwargs=None)

Register hook for component.

Input_args and input_kwargs are reserved for user registered hooks.

class neural_compressor.experimental.Quantization(conf_fname_or_obj=None)

Bases: neural_compressor.experimental.component.Component

This class provides easy use API for quantization.

It automatically searches for optimal quantization recipes for low precision model inference, achieving best tuning objectives like inference performance within accuracy loss constraints. Tuner abstracts out the differences of quantization APIs across various DL frameworks and brings a unified API for automatic quantization that works on frameworks including tensorflow, pytorch and mxnet. Since DL use cases vary in the accuracy metrics (Top-1, MAP, ROC etc.), loss criteria (<1% or <0.1% etc.) and tuning objectives (performance, memory footprint etc.). Tuner class provides a flexible configuration interface via YAML for users to specify these parameters.

Parameters:

conf_fname_or_obj (string or obj) – The path to the YAML configuration file or QuantConf class containing accuracy goal, tuning objective and preferred calibration & quantization tuning space etc.

property calib_dataloader

Get calib_dataloader attribute.

property metric

Get metric attribute.

property objective

Get objective attribute.

property postprocess

Get postprocess attribute.

property q_func

Get q_func attribute.

property model

Override model getter method to handle quantization aware training case.

pre_process()

Prepare dataloaders, qfuncs for Component.

execute()

Quantization execute routinue based on strategy design.

dataset(dataset_type, *args, **kwargs)

Get dataset according to dataset_type.

class neural_compressor.experimental.Pruning(conf_fname_or_obj=None)

Bases: neural_compressor.experimental.component.Component

This is base class of pruning object.

Since DL use cases vary in the accuracy metrics (Top-1, MAP, ROC etc.), loss criteria (<1% or <0.1% etc.) and pruning objectives (performance, memory footprint etc.). Pruning class provides a flexible configuration interface via YAML for users to specify these parameters.

Parameters:

conf_fname_or_obj (string or obj) – The path to the YAML configuration file or PruningConf class containing accuracy goal, pruning objective and related dataloaders etc.

conf

A config dict object. Contains pruning setting parameters.

pruners

A list of Pruner object.

property pruning_func

Not support get pruning_func.

property evaluation_distributed

Getter to know whether need distributed evaluation dataloader.

property train_distributed

Getter to know whether need distributed training dataloader.

update_items_for_all_pruners(**kwargs)

Functions which add User-defined arguments to the original configurations.

The original config of pruning is read from a file. However, users can still modify configurations by passing key-value arguments in this function. Please note that the key-value arguments’ keys are analysable in current configuration.

prepare()

Functions prepare for generate_hooks, generate_pruners.

pre_process()

Functions called before pruning begins, usually set up pruners.

execute()

Functions that execute the pruning process.

Follow the working flow: evaluate the dense model -> train/prune the model, evaluate the sparse model.

generate_hooks()

Register hooks for pruning.

generate_pruners()

Functions that generate pruners and set up self.pruners.

get_sparsity_ratio()

Functions that calculate a modules/layers sparsity.

Returns:

Three floats. elementwise_over_matmul_gemm_conv refers to zero elements’ ratio in pruning layers. elementwise_over_all refers to zero elements’ ratio in all layers in the model. blockwise_over_matmul_gemm_conv refers to all-zero blocks’ ratio in pruning layers.

class neural_compressor.experimental.Benchmark(conf_fname_or_obj=None)

Bases: object

Benchmark class is used to evaluate the model performance with the objective settings.

Users can use the data that they configured in YAML NOTICE: neural_compressor Benchmark will use the original command to run sub-process, which depends on the user’s code and has the possibility to run unnecessary code

property results

Get the results of benchmarking.

property b_dataloader

Get the dataloader for the benchmarking.

property b_func

Not support getting b_func.

property model

Get the model.

property metric

Not support getting metric.

property postprocess

Not support getting postprocess.

summary_benchmark()

Get the summary of the benchmark.

config_instance()

Configure the multi-instance commands and trigger benchmark with sub process.

generate_prefix(core_list)

Generate the command prefix with numactl.

Parameters:

core_list – a list of core indexes bound with specific instances

run_instance(mode)

Run the instance with the configuration.

Parameters:
  • mode – ‘performance’ or ‘accuracy’

  • set ('performance' mode runs benchmarking with numactl on specific cores and instances) – by user config and returns model performance

  • accuracy ('accuracy' mode runs benchmarking with full cores and returns model) –

class neural_compressor.experimental.Graph_Optimization(conf_fname_or_obj=None)

Graph_Optimization class.

automatically searches for optimal quantization recipes for low precision model inference, achieving best tuning objectives like inference performance within accuracy loss constraints. Tuner abstracts out the differences of quantization APIs across various DL frameworks and brings a unified API for automatic quantization that works on frameworks including tensorflow, pytorch and mxnet. Since DL use cases vary in the accuracy metrics (Top-1, MAP, ROC etc.), loss criteria (<1% or <0.1% etc.) and tuning objectives (performance, memory footprint etc.). Tuner class provides a flexible configuration interface via YAML for users to specify these parameters.

Parameters:

conf_fname_or_obj (string or obj) – The path to the YAML configuration file or Graph_Optimization_Conf class containing accuracy goal, tuning objective and preferred calibration & quantization tuning space etc.

property precisions

Get precision.

property input

Get input.

property output

Get output.

property eval_dataloader

Get eval_dataloader.

property model

Get model.

property metric

Get metric.

property postprocess

Get postprocess.

property eval_func

Get evaluation function.

dataset(dataset_type, *args, **kwargs)

Get dataset.

set_config_by_model(model_obj)

Set model config.

class neural_compressor.experimental.MixedPrecision(conf_fname_or_obj=None)

Bases: neural_compressor.experimental.graph_optimization.GraphOptimization

Class used for generating low precision model.

MixedPrecision class automatically generates low precision model across various DL frameworks including tensorflow, pytorch and onnxruntime.

property precisions

Get private member variable precisions of MixedPrecision class.

set_config_by_model(model_obj)

Set member variable conf by a input model object.

class neural_compressor.experimental.ModelConversion(conf_fname_or_obj=None)

ModelConversion class is used to convert one model format to another.

Currently Neural Compressor only supports Quantization-aware training TensorFlow model to Default quantized model.

The typical usage is:

from neural_compressor.experimental import ModelConversion, common conversion = ModelConversion() conversion.source = ‘QAT’ conversion.destination = ‘default’ conversion.model = ‘/path/to/saved_model’ q_model = conversion()

Parameters:

conf_fname_or_obj (string or obj) – Optional. The path to the YAML configuration file or Conf class containing model conversion and evaluation setting if not specifed by code.

property source

Return source.

property destination

Return destination.

property eval_dataloader

Return eval dataloader.

property model

Return model.

property metric

Return metric.

property postprocess

Check postprocess.

property eval_func

Return eval_func.

dataset(dataset_type, *args, **kwargs)

Return dataset.

Parameters:

dataset_type – dataset type

Returns:

dataset class

Return type:

class

class neural_compressor.experimental.Distillation(conf_fname_or_obj=None)

Bases: neural_compressor.experimental.component.Component

Distillation class derived from Component class.

Distillation class abstracted the pipeline of knowledge distillation, transfer the knowledge of the teacher model to the student model.

Parameters:

conf_fname_or_obj (string or obj) – The path to the YAML configuration file or Distillation_Conf containing accuracy goal, distillation objective and related dataloaders etc.

_epoch_ran

A integer indicating how much epochs ran.

eval_frequency

The frequency for doing evaluation of the student model in terms of epoch.

best_score

The best metric of the student model in the training.

best_model

The best student model found in the training.

property criterion

Getter of criterion.

Returns:

The criterion used in the distillation process.

property optimizer

Getter of optimizer.

Returns:

The optimizer used in the distillation process.

property teacher_model

Getter of the teacher model.

Returns:

The teacher model used in the distillation process.

property student_model

Getter of the student model.

Returns:

The student model used in the distillation process.

property train_cfg

Getter of the train configuration.

Returns:

The train configuration used in the distillation process.

property evaluation_distributed

Getter to know whether need distributed evaluation dataloader.

property train_distributed

Getter to know whether need distributed training dataloader.

on_post_forward(input, teacher_output=None)

Set or compute output of teacher model.

Deprecated.

init_train_cfg()

Initialize the training configuration.

create_criterion()

Create the criterion for training.

create_optimizer()

Create the optimizer for training.

prepare()

Prepare hooks.

pre_process()

Preprocessing before the disillation pipeline.

Initialize necessary parts for distillation pipeline.

execute()

Do distillation pipeline.

First train the student model with the teacher model, after training, evaluating the best student model if any.

Returns:

Best distilled model found.

generate_hooks()

Register hooks for distillation.

Register necessary hooks for distillation pipeline.

class neural_compressor.experimental.NAS

Bases: object

Create object of different NAS approaches.

Parameters:

conf_fname_or_obj (string or obj) – The path to the YAML configuration file or the object of NASConfig.

Returns:

An object of specified NAS approach.