:orphan: :py:mod:`neural_compressor.adaptor.torch_utils.gptq` ==================================================== .. py:module:: neural_compressor.adaptor.torch_utils.gptq Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: neural_compressor.adaptor.torch_utils.gptq.GPTQuantizer neural_compressor.adaptor.torch_utils.gptq.GPTQ Functions ~~~~~~~~~ .. autoapisummary:: neural_compressor.adaptor.torch_utils.gptq.is_leaf neural_compressor.adaptor.torch_utils.gptq.trace_gptq_target_blocks neural_compressor.adaptor.torch_utils.gptq.find_layers neural_compressor.adaptor.torch_utils.gptq.find_layers_name neural_compressor.adaptor.torch_utils.gptq.log_quantizable_layers_per_transformer neural_compressor.adaptor.torch_utils.gptq.quantize .. py:function:: is_leaf(module) Judge whether a module has no child-modules. :param module: torch.nn.Module :returns: whether a module has no child-modules. :rtype: a bool .. py:function:: trace_gptq_target_blocks(module, module_types=[torch.nn.ModuleList, torch.nn.Sequential]) Search transformer stacked structures, which is critical in LLMs and GPTQ execution. :param module: torch.nn.Module :param module_types: List of torch.nn.Module. :returns: gptq_related_blocks = { "embeddings": {}, # Dict embedding layers before transformer stack module, "transformers_pre": {}, # TODO "transformers_name": string. LLMs' transformer stack module name , "transformers": torch.nn.ModuleList. LLMs' transformer stack module, "transformers": {}, Dict# TODO } .. py:function:: find_layers(module, layers=[nn.Conv2d, nn.Conv1d, nn.Linear, transformers.Conv1D], name='') Get all layers with target types. .. py:function:: find_layers_name(module, layers=[nn.Conv2d, nn.Conv1d, nn.Linear, transformers.Conv1D], name='') Get all layers with target types. .. py:function:: log_quantizable_layers_per_transformer(transformer_blocks, layers=[nn.Conv2d, nn.Conv1d, nn.Linear, transformers.Conv1D]) Print all layers which will be quantized in GPTQ algorithm. .. py:function:: quantize(x, scale, zero, maxq) Do quantization. .. py:class:: GPTQuantizer(model, weight_config={}, dataloader=None, nsamples=128, use_max_length=True, pad_max_length=2048, device=None, layer_wise=False) Main API for GPTQ algorithm. Please refer to: GPTQ: Accurate Post-training Compression for Generative Pretrained Transformers url: https://arxiv.org/abs/2210.17323 .. py:class:: GPTQ(layer, W, device='cpu') Please refer to: GPTQ: Accurate Post-training Compression for Generative Pretrained Transformers (https://arxiv.org/abs/2210.17323)