Detection

class torchok.tasks.detection.SingleStageDetectionTask(hparams: DictConfig, backbone_name: str, head_name: str, neck_name: Optional[str] = None, num_scales: Optional[int] = None, backbone_params: Optional[dict] = None, neck_params: Optional[dict] = None, head_params: Optional[dict] = None, **kwargs)

Bases: BaseTask

__init__(hparams: DictConfig, backbone_name: str, head_name: str, neck_name: Optional[str] = None, num_scales: Optional[int] = None, backbone_params: Optional[dict] = None, neck_params: Optional[dict] = None, head_params: Optional[dict] = None, **kwargs)

Init SingleStageDetectionTask.

Parameters

hparams – Hyperparameters that set in yaml file.
backbone_name – name of the backbone architecture in the BACKBONES registry.
neck_name – name of the head architecture in the DETECTION_NECKS registry.
head_name – name of the neck architecture in the HEADS registry.
num_scales – number of feature maps that will be passed from backbone to the neck starting from the last one. Example: for backbone output [layer1, layer2, layer3, layer4] and num_scales=3 neck will get [layer2, layer3, layer4].
backbone_params – parameters for backbone constructor.
neck_params – parameters for neck constructor. in_channels will be set automatically based on backbone.
head_params – parameters for head constructor. in_channels will be set automatically based on neck.
inputs – information about input model shapes and dtypes.

forward(x: Tensor) → List[Dict[str, Tensor]]

Forward method.

Parameters

x – tensor of shape (B, C, H, W). Batch of input images.

Returns

List of length B containing dicts with two items bboxes and labels.

bboxes (torch.Tensor):
tensor of shape (N, 5), where N is the number of bboxes on the image, may be different for each image and even may be 0. Each box is form [x1, y1, x2, y2, confidence].
labels (torch.Tensor):
tensor of shape (N), containing class label of each bbox.

forward_with_gt(batch: Dict[str, Tensor]) → Dict[str, Any]

Forward with ground truth labels.

Parameters

batch –

Dictionary with the following keys and values:

image (torch.Tensor):
tensor of shape (B, C, H, W), representing input images.
bboxes (List[torch.Tensor]):
list of B tensors of shape (N, 4), where N is the number of bboxes on the image, may be different for each image and even may be 0. Each box is form [x_left, y_top, x_right, y_bottom]. May absent.
labels (List[torch.Tensor]):
list of B tensors of shape (N), containing class label of each bbox. May absent.

Returns

Dictionary with the keys related to specific detection head, input image shape and ground truth values if present.

as_module() → Sequential: Method for model representation as sequential of modules(need for onnx checkpointing).

training_step(batch: Dict[str, Tensor], batch_idx: int) → Dict[str, Tensor]: Complete training loop.

validation_step(batch: Dict[str, Tensor], batch_idx: int, dataloader_idx: int = 0) → Dict[str, Tensor]: Complete validation loop.

test_step(batch: Dict[str, Tensor], batch_idx: int, dataloader_idx: int = 0) → None: Complete test loop.

predict_step(batch: Dict[str, Tensor], batch_idx: int) → Dict[str, Tensor]: Complete predict loop.