neural_compressor.config

Configs for Neural Compressor 2.x.

Classes

DotDict

Access yaml using attributes instead of using the dictionary notation.

Options

Option Class for configs.

BenchmarkConfig

Config Class for Benchmark.

AccuracyCriterion

Class of Accuracy Criterion.

TuningCriterion

Class for Tuning Criterion.

PostTrainingQuantConfig

Config Class for Post Training Quantization.

QuantizationAwareTrainingConfig

Config Class for Quantization Aware Training.

WeightPruningConfig

Config Class for Pruning. Define a single or a sequence of pruning configs.

HPOConfig

Config class for hyperparameter optimization.

KnowledgeDistillationLossConfig

Config Class for Knowledge Distillation Loss.

IntermediateLayersKnowledgeDistillationLossConfig

Config Class for Intermediate Layers Knowledge Distillation Loss.

SelfKnowledgeDistillationLossConfig

Config Class for Self Knowledge Distillation Loss.

DistillationConfig

Config of distillation.

MixedPrecisionConfig

Config Class for MixedPrecision.

ExportConfig

Common Base Config for Export.

ONNXQlinear2QDQConfig

Config Class for ONNXQlinear2QDQ.

Torch2ONNXConfig

Config Class for Torch2ONNX.

TF2ONNXConfig

Config Class for TF2ONNX.

NASConfig

Config class for NAS approaches.

MXNet

Base config class for MXNet.

ONNX

Config class for ONNX.

TensorFlow

Config class for TensorFlow.

Keras

Config class for Keras.

PyTorch

Config class for PyTorch.

Module Contents

class neural_compressor.config.DotDict(value=None)[source]

Access yaml using attributes instead of using the dictionary notation.

Parameters:

value (dict) – The dict object to access.

class neural_compressor.config.Options(random_seed=1978, workspace=default_workspace, resume_from=None, tensorboard=False)[source]

Option Class for configs.

This class is used for configuring global variables. The global variable options is created with this class. If you want to change global variables, you should use functions from utils.utility.py:

set_random_seed(seed: int) set_workspace(workspace: str) set_resume_from(resume_from: str) set_tensorboard(tensorboard: bool)

Parameters:
  • random_seed (int) – Random seed used in neural compressor. Default value is 1978.

  • workspace (str) –

    The directory where intermediate files and tuning history file are stored. Default value is:

    ”./nc_workspace/{}/”.format(datetime.datetime.now().strftime(“%Y-%m-%d_%H-%M-%S”)).

  • resume_from (str) –

    The directory you want to resume tuning history file from. The tuning history was automatically saved in the workspace directory

    during the last tune process.

    Default value is None.

  • tensorboard (bool) –

    This flag indicates whether to save the weights of the model and the inputs of each layer

    for visual display.

    Default value is False.

Example:

from neural_compressor import set_random_seed, set_workspace, set_resume_from, set_tensorboard
set_random_seed(2022)
set_workspace("workspace_path")
set_resume_from("workspace_path")
set_tensorboard(True)
class neural_compressor.config.BenchmarkConfig(inputs=[], outputs=[], backend='default', device='cpu', warmup=5, iteration=-1, model_name='', cores_per_instance=None, num_of_instance=1, inter_num_of_threads=None, intra_num_of_threads=None, ni_workload_name='profiling')[source]

Config Class for Benchmark.

Parameters:
  • inputs (list, optional) – A list of strings containing the inputs of model. Default is an empty list.

  • outputs (list, optional) – A list of strings containing the outputs of model. Default is an empty list.

  • backend (str, optional) – Backend name for model execution. Supported values include: “default”, “itex”, “ipex”, “onnxrt_trt_ep”, “onnxrt_cuda_ep”, “onnxrt_dnnl_ep”, “onnxrt_dml_ep”. Default value is “default”.

  • warmup (int, optional) – The number of iterations to perform warmup before running performance tests. Default value is 5.

  • iteration (int, optional) – The number of iterations to run performance tests. Default is -1.

  • model_name (str, optional) – The name of the model. Default value is empty.

  • cores_per_instance (int, optional) – The number of CPU cores to use per instance. Default value is None.

  • num_of_instance (int, optional) – The number of instances to use for performance testing. Default value is 1.

  • inter_num_of_threads (int, optional) – The number of threads to use for inter-thread operations. Default value is None.

  • intra_num_of_threads (int, optional) – The number of threads to use for intra-thread operations. Default value is None.

Example:

# Run benchmark according to config
from neural_compressor.benchmark import fit

conf = BenchmarkConfig(iteration=100, cores_per_instance=4, num_of_instance=7)
fit(model="./int8.pb", conf=conf, b_dataloader=eval_dataloader)
class neural_compressor.config.AccuracyCriterion(higher_is_better=True, criterion='relative', tolerable_loss=0.01)[source]

Class of Accuracy Criterion.

Parameters:
  • higher_is_better (bool, optional) – This flag indicates whether the metric higher is the better. Default value is True.

  • criterion – (str, optional): This flag indicates whether the metric loss is “relative” or “absolute”. Default value is “relative”.

  • tolerable_loss (float, optional) – This float indicates how much metric loss we can accept. Default value is 0.01.

Example:

from neural_compressor.config import AccuracyCriterion

accuracy_criterion = AccuracyCriterion(
    higher_is_better=True,  # optional.
    criterion="relative",  # optional. Available values are "relative" and "absolute".
    tolerable_loss=0.01,  # optional.
)
class neural_compressor.config.TuningCriterion(strategy='basic', strategy_kwargs=None, timeout=0, max_trials=100, objective='performance')[source]

Class for Tuning Criterion.

Parameters:
  • strategy – Strategy name used in tuning. Please refer to docs/source/tuning_strategies.md.

  • strategy_kwargs – Parameters for strategy. Please refer to docs/source/tuning_strategies.md.

  • objective

    String or dict. Objective with accuracy constraint guaranteed. String value supports “performance”, “modelsize”, “footprint”. Default value is “performance”.

    Please refer to docs/source/objective.md.

  • timeout – Tuning timeout (seconds). Default value is 0 which means early stop.

  • max_trials – Max tune times. Default value is 100. Combine with timeout field to decide when to exit.

Example::

from neural_compressor.config import TuningCriterion

tuning_criterion=TuningCriterion(

timeout=0, max_trials=100, strategy=”basic”, strategy_kwargs=None,

)

class neural_compressor.config.PostTrainingQuantConfig(device='cpu', backend='default', domain='auto', recipes={}, quant_format='default', inputs=[], outputs=[], approach='static', calibration_sampling_size=[100], op_type_dict=None, op_name_dict=None, reduce_range=None, example_inputs=None, excluded_precisions=[], quant_level='auto', accuracy_criterion=accuracy_criterion, tuning_criterion=tuning_criterion, ni_workload_name='quantization')[source]

Config Class for Post Training Quantization.

Parameters:
  • device – Support “cpu”, “gpu”, “npu” and “xpu”.

  • backend – Backend for model execution. Support “default”, “itex”, “ipex”, “onnxrt_trt_ep”, “onnxrt_cuda_ep”, “onnxrt_dnnl_ep”, “onnxrt_dml_ep”

  • domain – Model domain. Support “auto”, “cv”, “object_detection”, “nlp” and “recommendation_system”. Adaptor will use specific quantization settings for different domains automatically, and explicitly specified quantization settings will override the automatic setting. If users set domain as auto, automatic detection for domain will be executed.

  • recipes

    Recipes for quantiztaion, support list is as below. “smooth_quant”: whether do smooth quant “smooth_quant_args”: parameters for smooth_quant “layer_wise_quant”: whether to use layer wise quant “fast_bias_correction”: whether do fast bias correction “weight_correction”: whether do weight correction “gemm_to_matmul”: whether convert gemm to matmul and add, only valid for onnx models “graph_optimization_level”: support “DISABLE_ALL”, “ENABLE_BASIC”, “ENABLE_EXTENDED”, “ENABLE_ALL”

    only valid for onnx models

    ”first_conv_or_matmul_quantization”: whether quantize the first conv or matmul “last_conv_or_matmul_quantization”: whether quantize the last conv or matmul “pre_post_process_quantization”: whether quantize the ops in preprocess and postprocess “add_qdq_pair_to_weight”: whether add QDQ pair for weights, only valid for onnxrt_trt_ep “optypes_to_exclude_output_quant”: don”t quantize output of specified optypes “dedicated_qdq_pair”: whether dedicate QDQ pair, only valid for onnxrt_trt_ep

  • quant_format – Support “default”, “QDQ” and “QOperator”, only required in ONNXRuntime.

  • inputs – Inputs of model, only required in tensorflow.

  • outputs – Outputs of model, only required in tensorflow.

  • approach

    Post-Training Quantization method. Neural compressor support “static”, “dynamic”,

    ”weight_only” and “auto” method.

    Default value is “static”. For strategy “basic”, “auto” method means neural compressor will quantize all OPs support PTQ static

    or PTQ dynamic. For OPs supporting both PTQ static and PTQ dynamic, PTQ static will be tried first, and PTQ dynamic will be tried when none of the OP type wise tuning configs meet the accuracy loss criteria.

    For strategy “bayesian”, “mse”, “mse_v2” and “HAWQ_V2”, “exhaustive”, and “random”,

    ”auto” means neural compressor will quantize all OPs support PTQ static or PTQ dynamic. if OPs supporting both PTQ static and PTQ dynamic, PTQ static will be tried, else PTQ dynamic will be tried.

  • calibration_sampling_size – Number of calibration sample.

  • op_type_dict

    Tuning constraints on optype-wise for advance user to reduce tuning space. User can specify the quantization config by op type: example: {

    ”Conv”: {
    “weight”: {

    “dtype”: [“fp32”]

    }, “activation”: {

    ”dtype”: [“fp32”]

    }

    }

    }

  • op_name_dict

    Tuning constraints on op-wise for advance user to reduce tuning space. User can specify the quantization config by op name: example: {

    ”layer1.0.conv1”: {
    “activation”: {

    “dtype”: [“fp32”]

    }, “weight”: {

    ”dtype”: [“fp32”]

    }

    },

    }

  • reduce_range – Whether use 7 bits to quantization.

  • excluded_precisions – Precisions to be excluded, Default value is empty list. Neural compressor enable the mixed precision with fp32 + bf16 + int8 by default. If you want to disable bf16 data type, you can specify excluded_precisions = [“bf16].

  • quant_level – Support auto, 0 and 1, 0 is conservative strategy, 1 is basic or user-specified strategy, auto (default) is the combination of 0 and 1.

  • tuning_criterion

    Instance of TuningCriterion class. In this class you can set strategy, strategy_kwargs,

    timeout, max_trials and objective.

    Please refer to docstring of TuningCriterion class.

  • accuracy_criterion

    Instance of AccuracyCriterion class. In this class you can set higher_is_better,

    criterion and tolerable_loss.

    Please refer to docstring of AccuracyCriterion class.

Example:

from neural_compressor.config PostTrainingQuantConfig, TuningCriterion

conf = PostTrainingQuantConfig(
    quant_level="auto",
    tuning_criterion=TuningCriterion(
        timeout=0,
        max_trials=100,
    ),
)
class neural_compressor.config.QuantizationAwareTrainingConfig(device='cpu', backend='default', inputs=[], outputs=[], op_type_dict=None, op_name_dict=None, reduce_range=None, model_name='', quant_format='default', excluded_precisions=[], quant_level='auto', accuracy_criterion=accuracy_criterion, tuning_criterion=tuning_criterion)[source]

Config Class for Quantization Aware Training.

Parameters:
  • device – Support “cpu”, “gpu”, “npu” and “xpu”.

  • backend – Backend for model execution. Support “default”, “itex”, “ipex”, “onnxrt_trt_ep”, “onnxrt_cuda_ep”, “onnxrt_dnnl_ep”, “onnxrt_dml_ep”

  • inputs – Inputs of model, only required in tensorflow.

  • outputs – Outputs of model, only required in tensorflow.

  • op_type_dict

    Tuning constraints on optype-wise for advance user to reduce tuning space. User can specify the quantization config by op type: example: {

    ”Conv”: {
    “weight”: {

    “dtype”: [“fp32”]

    }, “activation”: {

    ”dtype”: [“fp32”]

    }

    }

    }

  • op_name_dict

    Tuning constraints on op-wise for advance user to reduce tuning space. User can specify the quantization config by op name: example: {

    ”layer1.0.conv1”: {
    “activation”: {

    “dtype”: [“fp32”]

    }, “weight”: {

    ”dtype”: [“fp32”]

    }

    },

    }

  • reduce_range – Whether use 7 bits to quantization.

  • model_name – The name of the model. Default value is empty.

  • excluded_precisions – Precisions to be excluded, Default value is empty list. Neural compressor enable the mixed precision with fp32 + bf16 + int8 by default. If you want to disable bf16 data type, you can specify excluded_precisions = [“bf16].

  • quant_level – Support auto, 0 and 1, 0 is conservative strategy, 1 is basic or user-specified strategy, auto (default) is the combination of 0 and 1.

  • tuning_criterion

    Instance of TuningCriterion class. In this class you can set strategy, strategy_kwargs,

    timeout, max_trials and objective.

    Please refer to docstring of TuningCriterion class. This parameter only required by Quantization Aware Training with tuning.

  • accuracy_criterion

    Instance of AccuracyCriterion class. In this class you can set higher_is_better,

    criterion and tolerable_loss.

    Please refer to docstring of AccuracyCriterion class. This parameter only required by Quantization Aware Training with tuning.

Example:

from neural_compressor.config import QuantizationAwareTrainingConfig

if approach == "qat":
    model = copy.deepcopy(model_origin)
    conf = QuantizationAwareTrainingConfig(
        op_name_dict=qat_op_name_dict
    )
    compression_manager = prepare_compression(model, conf)
class neural_compressor.config.WeightPruningConfig(pruning_configs=[{}], target_sparsity=0.9, pruning_type='snip_momentum', pattern='4x1', op_names=[], excluded_op_names=[], backend=None, start_step=0, end_step=0, pruning_scope='global', pruning_frequency=1, min_sparsity_ratio_per_op=0.0, max_sparsity_ratio_per_op=0.98, sparsity_decay_type='exp', pruning_op_types=['Conv', 'Linear'], low_memory_usage=False, **kwargs)[source]

Config Class for Pruning. Define a single or a sequence of pruning configs.

Parameters:
  • pruning_configs (list of dicts, optional) – Local pruning configs only valid to linked layers. Parameters defined out of pruning_configs are valid for all layers. By defining dicts in pruning_config, users can set different pruning strategies for corresponding layers. Defaults to [{}].

  • target_sparsity (float, optional) – Sparsity ratio the model can reach after pruning. Supports a float between 0 and 1. Default to 0.90.

  • pruning_type (str, optional) –

    A string define the criteria for pruning. Supports “magnitude”, “snip”, “snip_momentum”,

    ”magnitude_progressive”, “snip_progressive”, “snip_momentum_progressive”, “pattern_lock”

    Default to “snip_momentum”, which is the most feasible pruning criteria under most situations.

  • pattern (str, optional) – Sparsity”s structure (or unstructure) types. Supports “NxM” (e.g “4x1”, “8x1”), “channelx1” & “1xchannel”(channel-wise), “N:M” (e.g “2:4”). Default to “4x1”, which can be directly processed by our kernels in ITREX.

  • op_names (list of str, optional) – Layers contains some specific names to be included for pruning. Defaults to [].

  • excluded_op_names – Layers contains some specific names to be excluded for pruning. Defaults to [].

  • start_step (int, optional) – The step to start pruning. Supports an integer. Default to 0.

  • end_step – (int, optional): The step to end pruning. Supports an integer. Default to 0.

  • pruning_scope (str, optional) – Determine layers” scores should be gather together to sort Supports “global” and “local”. Default: “global”, since this leads to less accuracy loss.

  • pruning_frequency – the frequency of pruning operation. Supports an integer. Default to 1.

  • min_sparsity_ratio_per_op (float, optional) – Minimum restriction for every layer”s sparsity. Supports a float between 0 and 1. Default to 0.0.

  • max_sparsity_ratio_per_op (float, optional) – Maximum restriction for every layer”s sparsity. Supports a float between 0 and 1. Default to 0.98.

  • sparsity_decay_type (str, optional) – how to schedule the sparsity increasing methods. Supports “exp”, “cube”, “cube”, “linear”. Default to “exp”.

  • pruning_op_types (list of str) – Operator types currently support for pruning. Supports [“Conv”, “Linear”]. Default to [“Conv”, “Linear”].

Example

from neural_compressor.config import WeightPruningConfig local_configs = [

{

“pruning_scope”: “local”, “target_sparsity”: 0.6, “op_names”: [“query”, “key”, “value”], “pattern”: “channelx1”,

}, {

“pruning_type”: “snip_momentum_progressive”, “target_sparsity”: 0.5, “op_names”: [“self.attention.dense”],

}

] config = WeightPruningConfig(

pruning_configs = local_configs, target_sparsity=0.8

) prune = Pruning(config) prune.update_config(start_step=1, end_step=10) prune.model = self.model

class neural_compressor.config.HPOConfig(search_space, searcher='xgb', higher_is_better=True, loss_type='reg', min_train_samples=10, seed=42)[source]

Config class for hyperparameter optimization.

Parameters:
  • search_space (dict) – A dictionary for defining the search space.

  • searcher (str) – The name of search algorithms, currently support: grid, random, bo and xgb.

  • higher_is_better (bool, optional) – This flag indicates whether the metric higher is the better.

  • min_train_sample (int, optional) – The min number of samples to start training the search model.

  • seed (int, optional) – Random seed.

class neural_compressor.config.KnowledgeDistillationLossConfig(temperature=1.0, loss_types=['CE', 'CE'], loss_weights=[0.5, 0.5])[source]

Config Class for Knowledge Distillation Loss.

Parameters:
  • temperature (float, optional) – Hyperparameters that control the entropy of probability distributions. Defaults to 1.0.

  • loss_types (list[str], optional) – loss types, should be a list of length 2. First item is the loss type for student model output and groundtruth label, second item is the loss type for student model output and teacher model output. Supported types for first item are “CE”, “MSE”. Supported types for second item are “CE”, “MSE”, “KL”. Defaults to [“CE”, “CE”].

  • loss_weights (list[float], optional) – loss weights, should be a list of length 2 and sum to 1.0. First item is the weight multiplied to the loss of student model output and groundtruth label, second item is the weight multiplied to the loss of student model output and teacher model output. Defaults to [0.5, 0.5].

Example:

from neural_compressor.config import DistillationConfig, KnowledgeDistillationLossConfig
from neural_compressor.training import prepare_compression

criterion_conf = KnowledgeDistillationLossConfig()
d_conf = DistillationConfig(teacher_model=teacher_model, criterion=criterion_conf)
compression_manager = prepare_compression(model, d_conf)
model = compression_manager.model
class neural_compressor.config.IntermediateLayersKnowledgeDistillationLossConfig(layer_mappings=[], loss_types=[], loss_weights=[], add_origin_loss=False)[source]

Config Class for Intermediate Layers Knowledge Distillation Loss.

Parameters:
  • layer_mappings (list) –

    A list for specifying the layer mappings relationship between the student model and the teacher model. Each item in layer_mappings should be a list with the format [(student_layer_name, student_layer_output_process), (teacher_layer_name, teacher_layer_output_process)], where the student_layer_name and the teacher_layer_name are the layer names of the student and the teacher models, e.g. “bert.layer1.attention”. The student_layer_output_process and teacher_layer_output_process are output process method to get the desired output from the layer specified in the layer name, its value can be either a function or a string, in function case, the function takes output of the specified layer as input, in string case, when output of the specified layer is a dict, this string will serve as key to get corresponding value, when output of the specified layer is a list or tuple, the string should be numeric and will serve as the index to get corresponding value. When output process is not needed, the item in layer_mappings can be abbreviated to [(student_layer_name, ), (teacher_layer_name, )], if student_layer_name and teacher_layer_name are the same, it can be abbreviated further to [(layer_name, )]. Some examples of the item in layer_mappings are listed below:

    [(“student_model.layer1.attention”, “1”), (“teacher_model.layer1.attention”, “1”)] [(“student_model.layer1.output”, ), (“teacher_model.layer1.output”, )]. [(“model.layer1.output”, )].

  • loss_types (list[str], optional) – loss types, should be a list with the same length of layer_mappings. Each item is the loss type for each layer mapping specified in the layer_mappings. Supported types for each item are “MSE”, “KL”, “L1”. Defaults to [“MSE”, ]*len(layer_mappings).

  • loss_weights (list[float], optional) – loss weights, should be a list with the same length of layer_mappings. Each item is the weight multiplied to the loss of each layer mapping specified in the layer_mappings. Defaults to [1.0 / len(layer_mappings)] * len(layer_mappings).

  • add_origin_loss (bool, optional) – Whether to add origin loss of the student model. Defaults to False.

Example:

from neural_compressor.config import DistillationConfig, IntermediateLayersKnowledgeDistillationLossConfig
from neural_compressor.training import prepare_compression

criterion_conf = IntermediateLayersKnowledgeDistillationLossConfig(
    layer_mappings=[["layer1.0", ],
                    [["layer1.1.conv1", ], ["layer1.1.conv1", "0"]],],
    loss_types=["MSE"]*len(layer_mappings),
    loss_weights=[1.0 / len(layer_mappings)]*len(layer_mappings),
    add_origin_loss=True
)
d_conf = DistillationConfig(teacher_model=teacher_model, criterion=criterion_conf)
compression_manager = prepare_compression(model, d_conf)
model = compression_manager.model
class neural_compressor.config.SelfKnowledgeDistillationLossConfig(layer_mappings=[], temperature=1.0, loss_types=[], loss_weights=[], add_origin_loss=False)[source]

Config Class for Self Knowledge Distillation Loss.

Parameters:
  • layer_mappings (list) – layers of distillation. Format like [[[student1_layer_name1, teacher_layer_name1],[student2_layer_name1, teacher_layer_name1]], [[student1_layer_name2, teacher_layer_name2],[student2_layer_name2, teacher_layer_name2]]]

  • temperature (float, optional) – use to calculate the soft label CE.

  • loss_types (list, optional) – loss types, should be a list with the same length of layer_mappings. Each item is the loss type for each layer mapping specified in the layer_mappings. Supported types for each item are “CE”, “KL”, “L2”. Defaults to [“CE”, ]*len(layer_mappings).

  • loss_weights (list, optional) – loss weights. Defaults to [1.0 / len(layer_mappings)] * len(layer_mappings).

  • add_origin_loss (bool, optional) – whether to add origin loss for hard label loss.

Example:

from neural_compressor.training import prepare_compression
from neural_compressor.config import DistillationConfig, SelfKnowledgeDistillationLossConfig

criterion_conf = SelfKnowledgeDistillationLossConfig(
    layer_mappings=[
        [["resblock.1.feature.output", "resblock.deepst.feature.output"],
        ["resblock.2.feature.output","resblock.deepst.feature.output"]],
        [["resblock.2.fc","resblock.deepst.fc"],
        ["resblock.3.fc","resblock.deepst.fc"]],
        [["resblock.1.fc","resblock.deepst.fc"],
        ["resblock.2.fc","resblock.deepst.fc"],
        ["resblock.3.fc","resblock.deepst.fc"]]
    ],
    temperature=3.0,
    loss_types=["L2", "KL", "CE"],
    loss_weights=[0.5, 0.05, 0.02],
    add_origin_loss=True,)
conf = DistillationConfig(teacher_model=model, criterion=criterion_conf)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.0001)
compression_manager = prepare_compression(model, conf)
model = compression_manager.model
class neural_compressor.config.DistillationConfig(teacher_model=None, criterion=criterion, optimizer={'SGD': {'learning_rate': 0.0001}})[source]

Config of distillation.

Parameters:
  • teacher_model (Callable) – Teacher model for distillation. Defaults to None.

  • features (optional) – Teacher features for distillation, features and teacher_model are alternative. Defaults to None.

  • criterion (Callable, optional) – Distillation loss configure.

  • optimizer (dictionary, optional) – Optimizer configure.

Example:

from neural_compressor.training import prepare_compression
from neural_compressor.config import DistillationConfig, KnowledgeDistillationLossConfig

distil_loss = KnowledgeDistillationLossConfig()
conf = DistillationConfig(teacher_model=model, criterion=distil_loss)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.0001)
compression_manager = prepare_compression(model, conf)
model = compression_manager.model
class neural_compressor.config.MixedPrecisionConfig(device='cpu', backend='default', precisions='bf16', model_name='', inputs=[], outputs=[], quant_level='auto', tuning_criterion=tuning_criterion, accuracy_criterion=accuracy_criterion, excluded_precisions=[], op_name_dict={}, op_type_dict={}, example_inputs=None)[source]

Config Class for MixedPrecision.

Parameters:
  • device (str, optional) – Device for execution. Support “cpu”, “gpu”, “npu” and “xpu”, default is “cpu”.

  • backend (str, optional) – Backend for model execution. Support “default”, “itex”, “ipex”, “onnxrt_trt_ep”, “onnxrt_cuda_ep”, “onnxrt_dnnl_ep”, “onnxrt_dml_ep”. Default is “default”.

  • precisions ([str, list], optional) – Target precision for mix precision conversion. Support “bf16” and “fp16”, default is “bf16”.

  • model_name (str, optional) – The name of the model. Default value is empty.

  • inputs (list, optional) – Inputs of model, default is [].

  • outputs (list, optional) – Outputs of model, default is [].

  • quant_level – Support auto, 0 and 1, 0 is conservative(fallback in op type wise), 1(fallback in op wise), auto (default) is the combination of 0 and 1.

  • tuning_criterion (TuningCriterion object, optional) – Accuracy tuning settings, it won”t work if there is no accuracy tuning process.

  • accuracy_criterion (AccuracyCriterion object, optional) – Accuracy constraint settings, it won”t work if there is no accuracy tuning process.

  • excluded_precisions (list, optional) – Precisions to be excluded during mix precision conversion, default is [].

  • op_type_dict (dict, optional) –

    Tuning constraints on optype-wise for advance user to reduce tuning space. User can specify the quantization config by op type: example: {

    ”Conv”: {
    “weight”: {

    “dtype”: [“fp32”]

    }, “activation”: {

    ”dtype”: [“fp32”]

    }

    }

    }

  • op_name_dict (dict, optional) –

    Tuning constraints on op-wise for advance user to reduce tuning space. User can specify the quantization config by op name: example: {

    ”layer1.0.conv1”: {
    “activation”: {

    “dtype”: [“fp32”]

    }, “weight”: {

    ”dtype”: [“fp32”]

    }

    },

    }

  • example_inputs (tensor|list|tuple|dict, optional) – Example inputs used for tracing model. Defaults to None.

Example

from neural_compressor import mix_precision from neural_compressor.config import MixedPrecisionConfig

conf = MixedPrecisionConfig() converted_model = mix_precision.fit(model, conf=conf)

class neural_compressor.config.ExportConfig(dtype='int8', opset_version=14, quant_format='QDQ', example_inputs=None, input_names=None, output_names=None, dynamic_axes=None)[source]

Common Base Config for Export.

Parameters:
  • dtype (str, optional) – The data type of the exported model, select from [“fp32”, “int8”]. Defaults to “int8”.

  • opset_version (int, optional) – The ONNX opset version used for export. Defaults to 14.

  • quant_format (str, optional) – The quantization format of the exported int8 onnx model, select from [“QDQ”, “QLinear”]. Defaults to “QDQ”.

  • example_inputs (tensor|list|tuple|dict, optional) – Example inputs used for tracing model. Defaults to None.

  • input_names (list, optional) – A list of model input names. Defaults to None.

  • output_names (list, optional) – A list of model output names. Defaults to None.

  • dynamic_axes (dict, optional) – A dictionary of dynamic axes information. Defaults to None.

class neural_compressor.config.ONNXQlinear2QDQConfig[source]

Config Class for ONNXQlinear2QDQ.

class neural_compressor.config.Torch2ONNXConfig(dtype='int8', opset_version=14, quant_format='QDQ', example_inputs=None, input_names=None, output_names=None, dynamic_axes=None, **kwargs)[source]

Config Class for Torch2ONNX.

Parameters:
  • dtype (str, optional) – The data type of the exported model, select from [“fp32”, “int8”]. Defaults to “int8”.

  • opset_version (int, optional) – The ONNX opset version used for export. Defaults to 14.

  • quant_format (str, optional) – The quantization format of the exported int8 onnx model, select from [“QDQ”, “QLinear”]. Defaults to “QDQ”.

  • example_inputs (tensor|list|tuple|dict, required) – Example inputs used for tracing model. Defaults to None.

  • input_names (list, optional) – A list of model input names. Defaults to None.

  • output_names (list, optional) – A list of model output names. Defaults to None.

  • dynamic_axes (dict, optional) – A dictionary of dynamic axes information. Defaults to None.

  • recipe (str, optional) – A string to select recipes used for Linear -> Matmul + Add, select from [“QDQ_OP_FP32_BIAS”, “QDQ_OP_INT32_BIAS”, “QDQ_OP_FP32_BIAS_QDQ”]. Defaults to “QDQ_OP_FP32_BIAS”.

Example

# resnet50 from neural_compressor.config import Torch2ONNXConfig int8_onnx_config = Torch2ONNXConfig(

dtype=”int8”, opset_version=14, quant_format=”QDQ”, # or QLinear example_inputs=torch.randn(1, 3, 224, 224), input_names=[“input”], output_names=[“output”], dynamic_axes={“input”: {0: “batch_size”},

“output”: {0: “batch_size”}},

) q_model.export(“int8-model.onnx”, int8_onnx_config)

class neural_compressor.config.TF2ONNXConfig(dtype='int8', opset_version=14, quant_format='QDQ', example_inputs=None, input_names=None, output_names=None, dynamic_axes=None, **kwargs)[source]

Config Class for TF2ONNX.

Parameters:
  • dtype (str, optional) – The data type of export target model. Supports “fp32” and “int8”. Defaults to “int8”.

  • opset_version (int, optional) – The version of the ONNX operator set to use. Defaults to 14.

  • quant_format (str, optional) – The quantization format for the export target model. Supports “default”, “QDQ” and “QOperator”. Defaults to “QDQ”.

  • example_inputs (list, optional) – A list example inputs to use for tracing the model. Defaults to None.

  • input_names (list, optional) – A list of model input names. Defaults to None.

  • output_names (list, optional) – A list of model output names. Defaults to None.

  • dynamic_axes (dict, optional) – A dictionary of dynamic axis information. Defaults to None.

  • **kwargs – Additional keyword arguments.

Examples:

# tensorflow QDQ int8 model "q_model" export to ONNX int8 model
from neural_compressor.config import TF2ONNXConfig
config = TF2ONNXConfig()
q_model.export(output_graph, config)
class neural_compressor.config.NASConfig(approach=None, search_space=None, search_algorithm=None, metrics=[], higher_is_better=[], max_trials=3, seed=42, dynas=None)[source]

Config class for NAS approaches.

class neural_compressor.config.MXNet(precisions=None)[source]

Base config class for MXNet.

class neural_compressor.config.ONNX(graph_optimization_level=None, precisions=None)[source]

Config class for ONNX.

class neural_compressor.config.TensorFlow(precisions=None)[source]

Config class for TensorFlow.

class neural_compressor.config.Keras(precisions=None)[source]

Config class for Keras.

class neural_compressor.config.PyTorch(precisions=None)[source]

Config class for PyTorch.