neural_compressor.utils.load_huggingface

Huggingface Loader: provides access to Huggingface pretrained models.

Module Contents

Classes

OptimizedModel

The class provides a method from_pretrained to access Huggingface models.

Functions

save_for_huggingface_upstream(model, tokenizer, output_dir)

Save the model and tokenizer in the output directory.

export_compressed_model(model[, saved_dir, ...])

Support get compressed model from saved_dir.

class neural_compressor.utils.load_huggingface.OptimizedModel(*args, **kwargs)[source]

The class provides a method from_pretrained to access Huggingface models.

neural_compressor.utils.load_huggingface.save_for_huggingface_upstream(model, tokenizer, output_dir)[source]

Save the model and tokenizer in the output directory.

neural_compressor.utils.load_huggingface.export_compressed_model(model, saved_dir=None, use_optimum_format=True, enable_full_range=False, compression_dtype=torch.int32, compression_dim=1, scale_dtype=torch.float32, device='cpu')[source]

Support get compressed model from saved_dir.

Parameters:
  • model (torch.nn.Module) – origin fp32 model.

  • saved_dir (_type_, optional) – the dir path of compression info. Defaults to None.

  • use_optimum_format (bool, optional) – whether use HuggingFace format. Defaults to True.

  • enable_full_range (bool, optional) – Whether to leverage the full compression range under symmetric quantization. Defaults to False.

  • compression_dtype (torch.Tensor, optional) – The target dtype after comoression. Defaults to torch.int32.

  • compression_dim (int, optional) – Select from [0, 1], 0 is output channel, 1 is input channel. Defaults to 1.

  • scale_dtype (torch.Tensor, optional) – Use float32 or float16. Defaults to torch.float32.

  • device (str, optional) – choose device for compression. Defaults to cpu.