neural_compressor.adaptor.onnxrt

Module Contents

Classes

ONNXRUNTIMEAdaptor

The ONNXRT adaptor layer, do onnx-rt quantization, calibration, inspect layer tensors.

ONNXRT_QLinearOpsAdaptor

The ONNXRT adaptor layer, do onnx-rt quantization, calibration, inspect layer tensors.

ONNXRT_IntegerOpsAdaptor

The ONNXRT adaptor layer, do onnx-rt quantization, calibration, inspect layer tensors.

ONNXRT_QDQAdaptor

The ONNXRT adaptor layer, do onnx-rt quantization, calibration, inspect layer tensors.

ONNXRTQuery

Base class that defines Query Interface.

class neural_compressor.adaptor.onnxrt.ONNXRUNTIMEAdaptor(framework_specific_info)

Bases: neural_compressor.adaptor.adaptor.Adaptor

The ONNXRT adaptor layer, do onnx-rt quantization, calibration, inspect layer tensors.

Parameters:

framework_specific_info (dict) – framework specific configuration for quantization.

quantize(tune_cfg, model, data_loader, q_func=None)
The function is used to do calibration and quanitization in post-training

quantization.

Parameters:
  • tune_cfg (dict) – quantization config.

  • model (object) – model need to do quantization.

  • data_loader (object) – calibration dataset.

  • q_func (optional) – training function for quantization aware training mode, unimplement yet for onnx.

Returns:

quantized model

Return type:

(dict)

recover(model, q_config)

Execute the recover process on the specified model.

Parameters:
  • model (object) – model need to do quantization.

  • q_config (dict) – recover configuration

Returns:

quantized model

Return type:

(dict)

inspect_tensor(model, dataloader, op_list=[], iteration_list=[], inspect_type='activation', save_to_disk=False, save_path=None, quantization_cfg=None)

The function is used by tune strategy class for dumping tensor info.

set_tensor(model, tensor_dict)

The function is used by tune strategy class for setting tensor back to model.

Parameters:
  • model (object) – The model to set tensor. Usually it is quantized model.

  • tensor_dict (dict) –

    The tensor dict to set. Note the numpy array contains float value, adaptor layer has the responsibility to quantize to int8 or int32 to set into the quantized model if needed. The dict format is something like: {

    ’weight0_name’: numpy.array, ‘bias0_name’: numpy.array, …

    }

query_fw_capability(model)

The function is used to query framework capability. TODO: will be replaced by framework query API

Parameters:

model – onnx model

Returns:

quantization capability

Return type:

(dict)

evaluate(input_graph, dataloader, postprocess=None, metrics=None, measurer=None, iteration=-1, tensorboard=False, fp32_baseline=False)

The function is for evaluation if no given eval func

Parameters:
  • input_graph – onnx model for evaluation

  • dataloader – dataloader for evaluation. neural_compressor.data.dataloader.ONNXDataLoader

  • postprocess – post-process for evalution. neural_compressor.data.transform.ONNXTransforms

  • metrics – : metrics for evaluation. neural_compressor.metric.ONNXMetrics

  • measurer – neural_compressor.objective.Measurer

  • iteration (int) – max iterations of evaluaton.

  • tensorboard (bool) – whether to use tensorboard for visualizaton

  • fp32_baseline (boolen, optional) – only for compare_label=False pipeline

Returns:

(float) evaluation results. acc, f1 e.g.

save(model, path)

save model

Parameters:
  • model (ModelProto) – model to save

  • path (str) – save path

class neural_compressor.adaptor.onnxrt.ONNXRT_QLinearOpsAdaptor(framework_specific_info)

Bases: ONNXRUNTIMEAdaptor

The ONNXRT adaptor layer, do onnx-rt quantization, calibration, inspect layer tensors.

Parameters:

framework_specific_info (dict) – framework specific configuration for quantization.

class neural_compressor.adaptor.onnxrt.ONNXRT_IntegerOpsAdaptor(framework_specific_info)

Bases: ONNXRUNTIMEAdaptor

The ONNXRT adaptor layer, do onnx-rt quantization, calibration, inspect layer tensors.

Parameters:

framework_specific_info (dict) – framework specific configuration for quantization.

class neural_compressor.adaptor.onnxrt.ONNXRT_QDQAdaptor(framework_specific_info)

Bases: ONNXRUNTIMEAdaptor

The ONNXRT adaptor layer, do onnx-rt quantization, calibration, inspect layer tensors.

Parameters:

framework_specific_info (dict) – framework specific configuration for quantization.

class neural_compressor.adaptor.onnxrt.ONNXRTQuery(local_config_file=None)

Bases: neural_compressor.adaptor.query.QueryBackendCapability

Base class that defines Query Interface. Each adaption layer should implement the inherited class for specific backend on their own.

get_version()

Get the current backend version infomation.

Returns:

version string.

Return type:

[string]

get_precisions()

Get supported precisions for current backend.

Returns:

the precisions’ name.

Return type:

[string list]

get_op_types()

Get the supported op types by all precisions.

Returns:

A list composed of dictionary which key is precision and value is the op types.

Return type:

[dictionary list]

get_quantization_capability()

Get the supported op types’ quantization capability.

Returns:

A list composed of dictionary which key is precision and value is a dict that describes all op types’ quantization capability.

Return type:

[dictionary list]

get_op_types_by_precision(precision)

Get op types per precision

Parameters:

precision (string) – precision name

Returns:

A list composed of op type.

Return type:

[string list]

get_graph_optimization()

Get onnxruntime graph optimization level