neural_compressor.torch.algorithms.weight_only.modules

Torch.nn.Module Class Definition.

Module Contents

Classes

FakeAffineTensorQuantFunction

Fake version of affine quantization.

TEQLinearFakeQuant

Wrapper quantization linear.

MulLinear

Linear wrapper to apply scale to input.

class neural_compressor.torch.algorithms.weight_only.modules.FakeAffineTensorQuantFunction[source]

Fake version of affine quantization.

class neural_compressor.torch.algorithms.weight_only.modules.TEQLinearFakeQuant(orig_layer, alpha=None, num_bits=4, group_size=-1, scheme='asym')[source]

Wrapper quantization linear.

class neural_compressor.torch.algorithms.weight_only.modules.MulLinear(module, input_scale=None)[source]

Linear wrapper to apply scale to input.