neural_compressor.adaptor.onnxrt
¶
Module Contents¶
Classes¶
The ONNXRT adaptor layer, do onnx-rt quantization, calibration, inspect layer tensors. |
|
The ONNXRT adaptor layer, do onnx-rt quantization, calibration, inspect layer tensors. |
|
The ONNXRT adaptor layer, do onnx-rt quantization, calibration, inspect layer tensors. |
|
The ONNXRT adaptor layer, do onnx-rt quantization, calibration, inspect layer tensors. |
|
Base class that defines Query Interface. |
- class neural_compressor.adaptor.onnxrt.ONNXRUNTIMEAdaptor(framework_specific_info)¶
Bases:
neural_compressor.adaptor.adaptor.Adaptor
The ONNXRT adaptor layer, do onnx-rt quantization, calibration, inspect layer tensors.
- Parameters:
framework_specific_info (dict) – framework specific configuration for quantization.
- quantize(tune_cfg, model, data_loader, q_func=None)¶
- The function is used to do calibration and quanitization in post-training
quantization.
- Parameters:
tune_cfg (dict) – quantization config.
model (object) – model need to do quantization.
data_loader (object) – calibration dataset.
q_func (optional) – training function for quantization aware training mode, unimplement yet for onnx.
- Returns:
quantized model
- Return type:
(dict)
- recover(model, q_config)¶
Execute the recover process on the specified model.
- Parameters:
model (object) – model need to do quantization.
q_config (dict) – recover configuration
- Returns:
quantized model
- Return type:
(dict)
- inspect_tensor(model, dataloader, op_list=[], iteration_list=[], inspect_type='activation', save_to_disk=False, save_path=None, quantization_cfg=None)¶
The function is used by tune strategy class for dumping tensor info.
- set_tensor(model, tensor_dict)¶
The function is used by tune strategy class for setting tensor back to model.
- Parameters:
model (object) – The model to set tensor. Usually it is quantized model.
tensor_dict (dict) –
The tensor dict to set. Note the numpy array contains float value, adaptor layer has the responsibility to quantize to int8 or int32 to set into the quantized model if needed. The dict format is something like: {
’weight0_name’: numpy.array, ‘bias0_name’: numpy.array, …
}
- query_fw_capability(model)¶
The function is used to query framework capability. TODO: will be replaced by framework query API
- Parameters:
model – onnx model
- Returns:
quantization capability
- Return type:
(dict)
- evaluate(input_graph, dataloader, postprocess=None, metrics=None, measurer=None, iteration=-1, tensorboard=False, fp32_baseline=False)¶
The function is for evaluation if no given eval func
- Parameters:
input_graph – onnx model for evaluation
dataloader – dataloader for evaluation. neural_compressor.data.dataloader.ONNXDataLoader
postprocess – post-process for evalution. neural_compressor.data.transform.ONNXTransforms
metrics – : metrics for evaluation. neural_compressor.metric.ONNXMetrics
measurer – neural_compressor.objective.Measurer
iteration (int) – max iterations of evaluaton.
tensorboard (bool) – whether to use tensorboard for visualizaton
fp32_baseline (boolen, optional) – only for compare_label=False pipeline
- Returns:
(float) evaluation results. acc, f1 e.g.
- save(model, path)¶
save model
- Parameters:
model (ModelProto) – model to save
path (str) – save path
- class neural_compressor.adaptor.onnxrt.ONNXRT_QLinearOpsAdaptor(framework_specific_info)¶
Bases:
ONNXRUNTIMEAdaptor
The ONNXRT adaptor layer, do onnx-rt quantization, calibration, inspect layer tensors.
- Parameters:
framework_specific_info (dict) – framework specific configuration for quantization.
- class neural_compressor.adaptor.onnxrt.ONNXRT_IntegerOpsAdaptor(framework_specific_info)¶
Bases:
ONNXRUNTIMEAdaptor
The ONNXRT adaptor layer, do onnx-rt quantization, calibration, inspect layer tensors.
- Parameters:
framework_specific_info (dict) – framework specific configuration for quantization.
- class neural_compressor.adaptor.onnxrt.ONNXRT_QDQAdaptor(framework_specific_info)¶
Bases:
ONNXRUNTIMEAdaptor
The ONNXRT adaptor layer, do onnx-rt quantization, calibration, inspect layer tensors.
- Parameters:
framework_specific_info (dict) – framework specific configuration for quantization.
- class neural_compressor.adaptor.onnxrt.ONNXRTQuery(local_config_file=None)¶
Bases:
neural_compressor.adaptor.query.QueryBackendCapability
Base class that defines Query Interface. Each adaption layer should implement the inherited class for specific backend on their own.
- get_version()¶
Get the current backend version infomation.
- Returns:
version string.
- Return type:
[string]
- get_precisions()¶
Get supported precisions for current backend.
- Returns:
the precisions’ name.
- Return type:
[string list]
- get_op_types()¶
Get the supported op types by all precisions.
- Returns:
A list composed of dictionary which key is precision and value is the op types.
- Return type:
[dictionary list]
- get_quantization_capability()¶
Get the supported op types’ quantization capability.
- Returns:
A list composed of dictionary which key is precision and value is a dict that describes all op types’ quantization capability.
- Return type:
[dictionary list]
- get_op_types_by_precision(precision)¶
Get op types per precision
- Parameters:
precision (string) – precision name
- Returns:
A list composed of op type.
- Return type:
[string list]
- get_graph_optimization()¶
Get onnxruntime graph optimization level