models.detr

DETR model and criterion classes.

Classes

`DETR`	This is the DETR module that performs object detection.
`SetCriterion`	This class computes the loss for DETR.
`PostProcess`	This module converts the model's output into the format expected by the coco api.
`MLP`	Very simple multi-layer perceptron (also called FFN)

Module Contents

class models.detr.DETR(backbone, transformer, num_classes, num_queries, aux_loss=False)[source]

This is the DETR module that performs object detection.

forward(samples: util.misc.NestedTensor)[source]

The forward expects a NestedTensor, which consists of:

samples.tensor: batched images, of shape [batch_size x 3 x H x W]

samples.mask: a binary mask of shape [batch_size x H x W], containing 1 on padded pixels

It returns a dict with the following elements:

“pred_logits”: the classification logits (including no-object) for all queries.
Shape= [batch_size x num_queries x (num_classes + 1)]
“pred_boxes”: The normalized boxes coordinates for all queries, represented as
(center_x, center_y, height, width). These values are normalized in [0, 1], relative to the size of each individual image (disregarding possible padding). See PostProcess for information on how to retrieve the unnormalized bounding box.
“aux_outputs”: Optional, only returned when auxiliary losses are activated. It is a list of
dictionaries containing the two above keys for each decoder layer.

class models.detr.SetCriterion(num_classes, matcher, weight_dict, eos_coef, losses, emphasized_weights={})[source]

This class computes the loss for DETR.

The process happens in two steps:

we compute hungarian assignment between ground truth boxes and the outputs of the model
we supervise each pair of matched ground-truth / prediction (supervise class and box)

loss_labels(outputs, targets, indices, num_boxes, log=True)[source]: Classification loss (Negative Log Likelihood) targets dicts must contain the key “labels” containing a tensor of dim [nb_target_boxes]

loss_cardinality(outputs, targets, indices, num_boxes)[source]

Compute the cardinality error, ie the absolute error in the number of predicted non-empty boxes This is not really a loss, it is intended for logging purposes only.

It doesn’t propagate gradients

loss_boxes(outputs, targets, indices, num_boxes)[source]: Compute the losses related to the bounding boxes, the L1 regression loss and the GIoU loss targets dicts must contain the key “boxes” containing a tensor of dim [nb_target_boxes, 4] The target boxes are expected in format (center_x, center_y, w, h), normalized by the image size.

loss_masks(outputs, targets, indices, num_boxes)[source]: Compute the losses related to the masks: the focal loss and the dice loss. targets dicts must contain the key “masks” containing a tensor of dim [nb_target_boxes, h, w]

forward(outputs, targets)[source]

This performs the loss computation.

Parameters:

outputs – dict of tensors, see the output specification of the model for the format
targets – list of dicts, such that len(targets) == batch_size. The expected keys in each dict depends on the losses applied, see each loss’ doc

class models.detr.PostProcess[source]

This module converts the model’s output into the format expected by the coco api.

forward(outputs, target_sizes)[source]: Perform the computation :param outputs: raw outputs of the model :param target_sizes: tensor of dimension [batch_size x 2] containing the size of each images of the batch

For evaluation, this must be the original image size (before any data augmentation) For visualization, this should be the image size after data augment, but before padding

class models.detr.MLP(input_dim, hidden_dim, output_dim, num_layers)[source]: Very simple multi-layer perceptron (also called FFN)