neural_compressor.torch.amp.autocast

Module Contents

Classes

autocast

Instances of autocast serve as context managers or decorators that

class neural_compressor.torch.amp.autocast.autocast(device_type: str, dtype: torch.types._dtype | None = None, enabled: bool = True, cache_enabled: bool | None = None)[source]

Instances of autocast serve as context managers or decorators that allow regions of your script to run in mixed precision.

In these regions, ops run in an op-specific dtype chosen by autocast to improve performance while maintaining accuracy.

When entering an autocast-enabled region, Tensors may be any type. You should not call half() or bfloat16() on your model(s) or inputs when using autocasting.

autocast should wrap only the forward pass(es) of your network, including the loss computation(s). Backward passes under autocast are not recommended. Backward ops run in the same type that autocast used for corresponding forward ops.

# Enables autocasting for the inference pass with torch.autocast(device_type=”hpu”, dtype=torch.float8_e4m3fn):

output = model(input)

autocast can also be used as a decorator, e.g., on the forward method of your model:

class AutocastModel(nn.Module):
    ...
    @torch.autocast(device_type="cuda")
    def forward(self, input):
        ...

The autocast state is thread-local. If you want it enabled in a new thread, the context manager or decorator must be invoked in that thread. This affects torch.nn.DataParallel and torch.nn.parallel.DistributedDataParallel when used with more than one GPU per process (see Working with Multiple GPUs).

Parameters:
  • device_type (str, required) – Device type to use. Possible values are: ‘cuda’, ‘cpu’, ‘xpu’ and ‘hpu’. The type is the same as the type attribute of a torch.device. Thus, you may obtain the device type of a tensor using Tensor.device.type.

  • enabled (bool, optional) – Whether autocasting should be enabled in the region. Default: True

  • dtype (torch_dtype, optional) – Whether to use torch.float16 or torch.bfloat16.

  • cache_enabled (bool, optional) – Whether the weight cache inside autocast should be enabled. Default: True