neural_compressor.adaptor.torch_utils.gptq

Module Contents

Classes

GPTQuantizer

Main API for GPTQ algorithm.

GPTQ

Please refer to:

Functions

is_leaf(module)

Judge whether a module has no child-modules.

trace_gptq_target_blocks(module[, module_types])

Search transformer stacked structures, which is critical in LLMs and GPTQ execution.

find_layers(module[, layers, name])

Get all layers with target types.

find_layers_name(module[, layers, name])

Get all layers with target types.

log_quantizable_layers_per_transformer(transformer_blocks)

Print all layers which will be quantized in GPTQ algorithm.

quantize(x, scale, zero, maxq)

Do quantization.

neural_compressor.adaptor.torch_utils.gptq.is_leaf(module)[source]

Judge whether a module has no child-modules.

Parameters:

module – torch.nn.Module

Returns:

whether a module has no child-modules.

Return type:

a bool

neural_compressor.adaptor.torch_utils.gptq.trace_gptq_target_blocks(module, module_types=[torch.nn.ModuleList, torch.nn.Sequential])[source]

Search transformer stacked structures, which is critical in LLMs and GPTQ execution.

Parameters:
  • module – torch.nn.Module

  • module_types – List of torch.nn.Module.

Returns:

gptq_related_blocks = {

“embeddings”: {}, # Dict embedding layers before transformer stack module, “transformers_pre”: {}, # TODO “transformers_name”: string. LLMs’ transformer stack module name , “transformers”: torch.nn.ModuleList. LLMs’ transformer stack module, “transformers”: {}, Dict# TODO

}

neural_compressor.adaptor.torch_utils.gptq.find_layers(module, layers=[nn.Conv2d, nn.Conv1d, nn.Linear, transformers.Conv1D], name='')[source]

Get all layers with target types.

neural_compressor.adaptor.torch_utils.gptq.find_layers_name(module, layers=[nn.Conv2d, nn.Conv1d, nn.Linear, transformers.Conv1D], name='')[source]

Get all layers with target types.

neural_compressor.adaptor.torch_utils.gptq.log_quantizable_layers_per_transformer(transformer_blocks, layers=[nn.Conv2d, nn.Conv1d, nn.Linear, transformers.Conv1D])[source]

Print all layers which will be quantized in GPTQ algorithm.

neural_compressor.adaptor.torch_utils.gptq.quantize(x, scale, zero, maxq)[source]

Do quantization.

class neural_compressor.adaptor.torch_utils.gptq.GPTQuantizer(model, weight_config={}, dataloader=None, nsamples=128, use_max_length=True, pad_max_length=2048, device=None, layer_wise=False)[source]

Main API for GPTQ algorithm.

Please refer to: GPTQ: Accurate Post-training Compression for Generative Pretrained Transformers url: https://arxiv.org/abs/2210.17323

class neural_compressor.adaptor.torch_utils.gptq.GPTQ(layer, W, device='cpu')[source]

Please refer to: GPTQ: Accurate Post-training Compression for Generative Pretrained Transformers (https://arxiv.org/abs/2210.17323)