neural_compressor.conf.pythonic_config
Configs for Neural Compressor 1.x.
Module Contents
Classes
Option Class for configs. |
|
Class of Accuracy Criterion. |
|
Config Class for Benchmark. |
|
Basic class for quantization config. Inherited by PostTrainingQuantConfig and QuantizationAwareTrainingConfig. |
|
Config Class for Pruning. Define a single or a sequence of pruning configs. |
|
Config Class for Knowledge Distillation Loss. |
|
Config of distillation. |
- class neural_compressor.conf.pythonic_config.Options(random_seed=1978, workspace=default_workspace, resume_from=None, tensorboard=False)[source]
Option Class for configs.
This class is used for configuring global variables. The global variable options is created with this class. If you want to change global variables, you should use functions from utils.utility.py:
set_random_seed(seed: int) set_workspace(workspace: str) set_resume_from(resume_from: str) set_tensorboard(tensorboard: bool)
- Parameters:
random_seed (int) – Random seed used in neural compressor. Default value is 1978.
workspace (str) –
The directory where intermediate files and tuning history file are stored. Default value is:
’./nc_workspace/{}/’.format(datetime.datetime.now().strftime(‘%Y-%m-%d_%H-%M-%S’)).
resume_from (str) –
The directory you want to resume tuning history file from. The tuning history was automatically saved in the workspace directory
during the last tune process.
Default value is None.
tensorboard (bool) –
- This flag indicates whether to save the weights of the model and the inputs of each layer
for visual display.
Default value is False.
Example:
from neural_compressor import set_random_seed, set_workspace, set_resume_from, set_tensorboard set_random_seed(2022) set_workspace("workspace_path") set_resume_from("workspace_path") set_tensorboard(True)
- class neural_compressor.conf.pythonic_config.AccuracyCriterion(higher_is_better=True, criterion='relative', tolerable_loss=0.01)[source]
Class of Accuracy Criterion.
- Parameters:
higher_is_better (bool, optional) – This flag indicates whether the metric higher is the better. Default value is True.
criterion – (str, optional): This flag indicates whether the metric loss is ‘relative’ or ‘absolute’. Default value is ‘relative’.
tolerable_loss (float, optional) – This float indicates how much metric loss we can accept. Default value is 0.01.
Example:
from neural_compressor.config import AccuracyCriterion accuracy_criterion = AccuracyCriterion( higher_is_better=True, # optional. criterion='relative', # optional. Available values are 'relative' and 'absolute'. tolerable_loss=0.01, # optional. )
- class neural_compressor.conf.pythonic_config.BenchmarkConfig(inputs=[], outputs=[], backend='default', device='cpu', warmup=5, iteration=-1, model=None, model_name='', cores_per_instance=None, num_of_instance=None, inter_num_of_threads=None, intra_num_of_threads=None, diagnosis=False)[source]
Config Class for Benchmark.
- Parameters:
inputs (list, optional) – A list of strings containing the inputs of model. Default is an empty list.
outputs (list, optional) – A list of strings containing the outputs of model. Default is an empty list.
backend (str, optional) – Backend name for model execution. Supported values include: ‘default’, ‘itex’, ‘ipex’, ‘onnxrt_trt_ep’, ‘onnxrt_cuda_ep’. Default value is ‘default’.
warmup (int, optional) – The number of iterations to perform warmup before running performance tests. Default value is 5.
iteration (int, optional) – The number of iterations to run performance tests. Default is -1.
cores_per_instance (int, optional) – The number of CPU cores to use per instance. Default value is None.
num_of_instance (int, optional) – The number of instances to use for performance testing. Default value is None.
inter_num_of_threads (int, optional) – The number of threads to use for inter-thread operations. Default value is None.
intra_num_of_threads (int, optional) – The number of threads to use for intra-thread operations. Default value is None.
Example:
# Run benchmark according to config from neural_compressor.benchmark import fit conf = BenchmarkConfig(iteration=100, cores_per_instance=4, num_of_instance=7) fit(model='./int8.pb', config=conf, b_dataloader=eval_dataloader)
- class neural_compressor.conf.pythonic_config.QuantizationConfig(inputs=[], outputs=[], backend='default', device='cpu', approach='post_training_static_quant', calibration_sampling_size=[100], op_type_dict=None, op_name_dict=None, strategy='basic', strategy_kwargs=None, objective='performance', timeout=0, max_trials=100, performance_only=False, reduce_range=None, use_bf16=True, quant_level='auto', accuracy_criterion=accuracy_criterion, diagnosis=False)[source]
Basic class for quantization config. Inherited by PostTrainingQuantConfig and QuantizationAwareTrainingConfig.
- Parameters:
inputs – Inputs of model, only required in tensorflow.
outputs – Outputs of model, only required in tensorflow.
backend – Backend for model execution. Support ‘default’, ‘itex’, ‘ipex’, ‘onnxrt_trt_ep’, ‘onnxrt_cuda_ep’
domain – Model domain. Support ‘auto’, ‘cv’, ‘object_detection’, ‘nlp’ and ‘recommendation_system’. Adaptor will use specific quantization settings for different domains automatically, and explicitly specified quantization settings will override the automatic setting. If users set domain as auto, automatic detection for domain will be executed.
recipes –
Recipes for quantiztaion, support list is as below. ‘smooth_quant’: whether do smooth quant ‘smooth_quant_args’: parameters for smooth_quant ‘fast_bias_correction’: whether do fast bias correction ‘weight_correction’: whether do weight correction ‘gemm_to_matmul’: whether convert gemm to matmul and add, only valid for onnx models ‘graph_optimization_level’: support ‘DISABLE_ALL’, ‘ENABLE_BASIC’, ‘ENABLE_EXTENDED’, ‘ENABLE_ALL’
only valid for onnx models
’first_conv_or_matmul_quantization’: whether quantize the first conv or matmul ‘last_conv_or_matmul_quantization’: whether quantize the last conv or matmul ‘pre_post_process_quantization’: whether quantize the ops in preprocess and postprocess ‘add_qdq_pair_to_weight’: whether add QDQ pair for weights, only valid for onnxrt_trt_ep ‘optypes_to_exclude_output_quant’: don’t quantize output of specified optypes ‘dedicated_qdq_pair’: whether dedicate QDQ pair, only valid for onnxrt_trt_ep
quant_format – Support ‘default’, ‘QDQ’ and ‘QOperator’, only required in ONNXRuntime.
device – Support ‘cpu’ and ‘gpu’.
calibration_sampling_size – Number of calibration sample.
op_type_dict –
Tuning constraints on optype-wise for advance user to reduce tuning space. User can specify the quantization config by op type: example: {
- ’Conv’: {
- ‘weight’: {
‘dtype’: [‘fp32’]
}, ‘activation’: {
’dtype’: [‘fp32’]
}
}
}
op_name_dict –
Tuning constraints on op-wise for advance user to reduce tuning space. User can specify the quantization config by op name: example: {
- ”layer1.0.conv1”: {
- “activation”: {
“dtype”: [“fp32”]
}, “weight”: {
”dtype”: [“fp32”]
}
},
}
strategy – Strategy name used in tuning, Please refer to docs/source/tuning_strategies.md.
strategy_kwargs – Parameters for strategy, Please refer to docs/source/tuning_strategies.md.
objective – Objective with accuracy constraint guaranteed, support ‘performance’, ‘modelsize’, ‘footprint’. Please refer to docs/source/objective.md. Default value is ‘performance’.
timeout – Tuning timeout (seconds). default value is 0 which means early stop
max_trials – Max tune times. default value is 100. Combine with timeout field to decide when to exit
performance_only – Whether do evaluation
reduce_range – Whether use 7 bit to quantization.
example_inputs – Used to trace PyTorch model with torch.jit/torch.fx.
excluded_precisions – Precisions to be excluded, Default value is empty list. Neural compressor enable the mixed precision with fp32 + bf16 + int8 by default. If you want to disable bf16 data type, you can specify excluded_precisions = [‘bf16].
quant_level – Support auto, 0 and 1, 0 is conservative strategy, 1 is basic or user-specified strategy, auto (default) is the combination of 0 and 1.
accuracy_criterion – Accuracy constraint settings.
use_distributed_tuning – Whether use distributed tuning or not.
- class neural_compressor.conf.pythonic_config.WeightPruningConfig(pruning_configs=[{}], target_sparsity=0.9, pruning_type='snip_momentum', pattern='4x1', op_names=[], excluded_op_names=[], start_step=0, end_step=0, pruning_scope='global', pruning_frequency=1, min_sparsity_ratio_per_op=0.0, max_sparsity_ratio_per_op=0.98, sparsity_decay_type='exp', pruning_op_types=['Conv', 'Linear'], **kwargs)[source]
Config Class for Pruning. Define a single or a sequence of pruning configs.
- Parameters:
pruning_configs (list of dicts, optional) – Local pruning configs only valid to linked layers. Parameters defined out of pruning_configs are valid for all layers. By defining dicts in pruning_config, users can set different pruning strategies for corresponding layers. Defaults to [{}].
target_sparsity (float, optional) – Sparsity ratio the model can reach after pruning. Supports a float between 0 and 1. Default to 0.90.
pruning_type (str, optional) –
A string define the criteria for pruning. Supports “magnitude”, “snip”, “snip_momentum”,
”magnitude_progressive”, “snip_progressive”, “snip_momentum_progressive”, “pattern_lock”
Default to “snip_momentum”, which is the most feasible pruning criteria under most situations.
pattern (str, optional) – Sparsity’s structure (or unstructure) types. Supports “NxM” (e.g “4x1”, “8x1”), “channelx1” & “1xchannel”(channel-wise), “N:M” (e.g “2:4”). Default to “4x1”, which can be directly processed by our kernels in ITREX.
op_names (list of str, optional) – Layers contains some specific names to be included for pruning. Defaults to [].
excluded_op_names – Layers contains some specific names to be excluded for pruning. Defaults to [].
start_step (int, optional) – The step to start pruning. Supports an integer. Default to 0.
end_step – (int, optional): The step to end pruning. Supports an integer. Default to 0.
pruning_scope (str, optional) – Determine layers’ scores should be gather together to sort Supports “global” and “local”. Default: “global”, since this leads to less accuracy loss.
pruning_frequency – the frequency of pruning operation. Supports an integer. Default to 1.
min_sparsity_ratio_per_op (float, optional) – Minimum restriction for every layer’s sparsity. Supports a float between 0 and 1. Default to 0.0.
max_sparsity_ratio_per_op (float, optional) – Maximum restriction for every layer’s sparsity. Supports a float between 0 and 1. Default to 0.98.
sparsity_decay_type (str, optional) – how to schedule the sparsity increasing methods. Supports “exp”, “cube”, “cube”, “linear”. Default to “exp”.
pruning_op_types (list of str) – Operator types currently support for pruning. Supports [‘Conv’, ‘Linear’]. Default to [‘Conv’, ‘Linear’].
Example:
from neural_compressor.config import WeightPruningConfig local_configs = [ { "pruning_scope": "local", "target_sparsity": 0.6, "op_names": ["query", "key", "value"], "pattern": "channelx1", }, { "pruning_type": "snip_momentum_progressive", "target_sparsity": 0.5, "op_names": ["self.attention.dense"], } ] config = WeightPruningConfig( pruning_configs = local_configs, target_sparsity=0.8 ) prune = Pruning(config) prune.update_config(start_step=1, end_step=10) prune.model = self.model
- class neural_compressor.conf.pythonic_config.KnowledgeDistillationLossConfig(temperature=1.0, loss_types=['CE', 'CE'], loss_weights=[0.5, 0.5])[source]
Config Class for Knowledge Distillation Loss.
- Parameters:
temperature (float, optional) – Hyperparameters that control the entropy of probability distributions. Defaults to 1.0.
loss_types (list[str], optional) – loss types, should be a list of length 2. First item is the loss type for student model output and groundtruth label, second item is the loss type for student model output and teacher model output. Supported types for first item are “CE”, “MSE”. Supported types for second item are “CE”, “MSE”, “KL”. Defaults to [‘CE’, ‘CE’].
loss_weights (list[float], optional) – loss weights, should be a list of length 2 and sum to 1.0. First item is the weight multiplied to the loss of student model output and groundtruth label, second item is the weight multiplied to the loss of student model output and teacher model output. Defaults to [0.5, 0.5].
Example:
from neural_compressor.config import DistillationConfig, KnowledgeDistillationLossConfig from neural_compressor.training import prepare_compression criterion_conf = KnowledgeDistillationLossConfig() d_conf = DistillationConfig(teacher_model=teacher_model, criterion=criterion_conf) compression_manager = prepare_compression(model, d_conf) model = compression_manager.model
- class neural_compressor.conf.pythonic_config.DistillationConfig(teacher_model=None, criterion=criterion, optimizer={'SGD': {'learning_rate': 0.0001}})[source]
Config of distillation.
- Parameters:
teacher_model (Callable) – Teacher model for distillation. Defaults to None.
features (optional) – Teacher features for distillation, features and teacher_model are alternative. Defaults to None.
criterion (Callable, optional) – Distillation loss configure.
optimizer (dictionary, optional) – Optimizer configure.
Example:
from neural_compressor.training import prepare_compression from neural_compressor.config import DistillationConfig, KnowledgeDistillationLossConfig distil_loss = KnowledgeDistillationLossConfig() conf = DistillationConfig(teacher_model=model, criterion=distil_loss) criterion = nn.CrossEntropyLoss() optimizer = torch.optim.SGD(model.parameters(), lr=0.0001) compression_manager = prepare_compression(model, conf) model = compression_manager.model