`neural_compressor.onnxrt.algorithms.weight_only.rtn`

Module Contents

Functions

`rtn_quantize`(model[, weight_config, num_bits, ...])	Quantize the model with round to nearst method.
`apply_rtn_on_model`(→ onnx.ModelProto)	Apply RTN on onnx model.

neural_compressor.onnxrt.algorithms.weight_only.rtn.rtn_quantize(model: onnx.ModelProto | neural_compressor.onnxrt.utils.onnx_model.ONNXModel | pathlib.Path | str, weight_config: dict = {}, num_bits: int = 4, group_size: int = 32, scheme: str = 'asym', ratios: dict = {}, accuracy_level: int = 0, providers: List[str] = ['CPUExecutionProvider'], return_modelproto: bool = True)[source]

Quantize the model with round to nearst method.

Parameters:

model (Union[onnx.ModelProto, ONNXModel, Path, str]) – onnx model
weight_config (dict, optional) –
quantization config For example, weight_config = {

’(fc2, “MatMul”)’:

{
‘weight_dtype’: ‘int’, ‘weight_bits’: 4, ‘weight_group_size’: 32, ‘weight_sym’: True, ‘accuracy_level’: 0

}

}. Defaults to {}.
num_bits (int, optional) – number of bits used to represent weights. Defaults to 4.
group_size (int, optional) – size of weight groups. Defaults to 32.
scheme (str, optional) – indicates whether weights are symmetric. Defaults to “asym”.
ratios (dict, optional) – percentile of clip. Defaults to {}.
accuracy_level (int, optional) – accuracy level. Support 0 (unset), 1(fp32 compute type of jblas kernel), 2 (fp16 compute type of jblas kernel), 3 (bf16 compute type of jblas kernel), 4 (int8 compute type of jblas kernel). Defaults to 0.
providers (list, optional) – providers to use. Defaults to [“CPUExecutionProvider”].
return_modelproto (bool, optionmal) – whether to return onnx.Modelproto. set False for layer-wise quant. Default to True

Returns:

quantized onnx model.

Return type:

onnx.ModelProto

neural_compressor.onnxrt.algorithms.weight_only.rtn.apply_rtn_on_model(model: onnx.ModelProto | neural_compressor.onnxrt.utils.onnx_model.ONNXModel | pathlib.Path | str, quant_config: dict) → onnx.ModelProto[source]

Apply RTN on onnx model.