Datasets

Interface

Each dataset in TorchOk has to be inherited from a single interface ImageDataset. There are a few methods that need to be implemented. Follow the general principles when you implement your dataset:

Constructor

def __init__(self,
            transform: Optional[Union[BasicTransform, BaseCompose]],
            augment: Optional[Union[BasicTransform, BaseCompose]] = None,
            input_dtype: str = 'float32',
            image_format: str = 'rgb',
            rgba_layout_color: Union[int, Tuple[int, int, int]] = 0,
            test_mode: bool = False):
    # Use transforms and augments for two different purposes: augmentations should be applied to get a randomly
    # manipulated image version while transformations are used to get a fixed transformation of each input image
    # to be able to pass it to the neural network model (like resizing, normalization and to-tensor conversion)

Length of the dataset

def __len__(self) -> int:
    # Return total expected length of the dataset

Getting access to a raw item of the dataset

def get_raw(self, idx: int) -> dict:
    # Read a sample from disk or whatever your dataset is using. You can utilize self._read_image(image_path) call.
    # Use augmentations on numpy images here
    # Return a dictionary with string keys and tensor values. Usually, images are returned as numpy arrays before
    # normalization here, so that a user can directly call this method to get understanding on how an output image
    # looks like

Getting access to a tensor item of the dataset

def __getitem__(self, idx: int) -> dict:
    # Usually, a self.get_raw(idx) is called here.
    # Then you should use transformations to transform numpy images and other samples to PyTorch tensors

Classification

class torchok.data.datasets.classification.classification.ImageClassificationDataset(data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, annotation_path: Optional[str] = None, num_classes: Optional[int] = None, input_column: str = 'image_path', input_dtype: str = 'float32', target_column: str = 'label', target_dtype: str = 'long', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False, multilabel: bool = False, lazy_init: bool = False, csv_path: Optional[str] = None)

Bases: ImageDataset

A generic dataset for multilabel/multiclass image classification task.

Multiclass task csv example.
image_path	label
cat_1.jpg	1
dog_1.jpg	0

Multilabel task csv example.
image_path	label
cat_dog_1.jpg	0 1
cat_dog_2.jpg	0 1
dog_1.jpg	0

__init__(data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, annotation_path: Optional[str] = None, num_classes: Optional[int] = None, input_column: str = 'image_path', input_dtype: str = 'float32', target_column: str = 'label', target_dtype: str = 'long', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False, multilabel: bool = False, lazy_init: bool = False, csv_path: Optional[str] = None)

Init ImageClassificationDataset.

Parameters

data_folder – Directory with all the images.
annotation_path – Path to the .pkl or .csv file with path to images and annotations. Path to images must be under column input_column and annotations must be under target_column column.
transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.
augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.
num_classes – Number of classes (i.e. maximum class index in the dataset).
input_column – column name containing paths to the images.
input_dtype – Data type of the torch tensors related to the image.
target_column – column name containing image label.
target_dtype – Data type of the torch tensors related to the target.
reader_library – Image reading library. Can be ‘opencv’ or ‘pillow’.
image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.
rgba_layout_color – color of the background during conversion from rgba.
test_mode – If True, only image without labels will be returned.
multilabel – If True, targets are being converted to multihot vector for multilabel task. If False, dataset prepares targets for multiclass classification.
lazy_init – If True, for multilabel the target variable is converted to multihot when __getitem__ is called. For multiclass will check the class index to fit the range when __getitem__ is called.
csv_path – DEPRECATED, Path to the .pkl or .csv file with path to images and annotations. Path to images must be under column input_column and annotations must be under target_column column.

get_raw(idx: int) → dict

Get item sample without transform application.

Returns: dict, where sample[‘image’] - np.array, representing image after augmentations. sample[‘target’] - Target class or labels. sample[‘index’] - Index of the sample, the same as input idx.
Return type: sample

__getitem__(idx: int) → dict

Get item sample.

Returns: dict, where sample[‘image’] - Tensor, representing image after augmentations and transformations, dtype=input_dtype. sample[‘target’] - Target class or labels, dtype=target_dtype. sample[‘index’] - Index of the sample, the same as input idx.
Return type: sample

process_function(target: Any) → Any

Prepare dataset target based of classification type.

Parameters: target – Classification labels to prepare.
Returns: Prepared classification labels.

Segmentation

class torchok.data.datasets.segmentation.image_segmentation.ImageSegmentationDataset(data_folder: Union[Path, str], annotation_path: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_column: str = 'image_path', input_dtype: str = 'float32', target_column: str = 'mask_path', target_dtype: str = 'int64', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)

Bases: ImageDataset

A dataset for image segmentation task.

Segmentation csv example.
image_path	mask
image1.png	mask1.png
image2.png	mask2.png
image3.png	mask3.png

__init__(data_folder: Union[Path, str], annotation_path: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_column: str = 'image_path', input_dtype: str = 'float32', target_column: str = 'mask_path', target_dtype: str = 'int64', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)

Init ImageSegmentationDataset.

Parameters

data_folder – Directory with all the images.
annotation_path – Path to the .pkl or .csv file with path to images and masks. Path to images must be under column image_path and annotations must be under mask column. User can change column names, if the csv_columns_mapping is given.
transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.
augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.
input_column – column name containing paths to the images.
input_dtype – Data type of the torch tensors related to the image.
target_dtype – Data type of the torch tensors related to the target.
reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.
image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.
rgba_layout_color – color of the background during conversion from rgba.
test_mode – If True, only image without labels will be returned.

get_raw(idx: int) → dict

__getitem__(idx: int) → Dict[str, Any]

Representation

class torchok.data.datasets.representation.unsupervised_contrastive_dataset.UnsupervisedContrastiveDataset(data_folder: str, transform: Union[BasicTransform, BaseCompose], augment: Optional[Union[BasicTransform, BaseCompose]] = None, annotation_path: Optional[str] = None, input_column: str = 'image_path', input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, csv_path: Optional[str] = None)

Bases: ImageDataset

A dataset for unsupervised contrastive task.

One image is transformed twice so that they are positive to each other.

UnsupervisedContrastive csv example
image_path
cat_1.jpg
dog_1.jpg

__init__(data_folder: str, transform: Union[BasicTransform, BaseCompose], augment: Optional[Union[BasicTransform, BaseCompose]] = None, annotation_path: Optional[str] = None, input_column: str = 'image_path', input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, csv_path: Optional[str] = None)

Init UnsupervisedContrastiveDataset.

Parameters

data_folder – Directory with all the images.
annotation_path – Path to the .pkl or .csv file with path to images and annotations. Path to images must be under column input_column.
transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.
augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.
input_column – column name containing paths to the images.
input_dtype – data type of the torch tensors related to the image.
reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.
image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.
rgba_layout_color – color of the background during conversion from rgba.
csv_path – DEPRECATED, Path to the .pkl or .csv file with path to images and annotations. Path to images must be under column input_column.

get_raw(idx: int) → dict

Get item sample.

Returns: dict, where sample[‘image_0’] - Tensor, representing image after augmentations. sample[‘image_1’] - Tensor, representing image after augmentations. sample[‘index’] - Index of the sample, the same as input idx.
Return type: sample

__getitem__(idx: int) → dict

Get item sample.

Returns: dict, where sample[‘image_0’] - Tensor, representing image after augmentations and transformations, dtype=input_dtype. sample[‘image_1’] - Tensor, representing image after augmentations and transformations, dtype=input_dtype. sample[‘index’] - Index of the sample, the same as input idx.
Return type: sample

class torchok.data.datasets.representation.validation.RetrievalDataset(data_folder: str, matches_csv_path: str, img_list_csv_path: str, transform: Union[BasicTransform, BaseCompose], augment: Optional[Union[BasicTransform, BaseCompose]] = None, gallery_folder: Optional[str] = '', gallery_list_csv_path: Optional[str] = None, use_query_without_relevants: bool = False, input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0)

Bases: ImageDataset

Dataset for image retrieval validation.

The searches are made by queries while looking for relevant items in the whole set of items. Where gallery items are treated non-relevant.

Example matches csv: Query ids should be unique int values, otherwise the rows having the same query id will be treated as different matches.

Relevant ids can be repeated in different queries.

Scores reflect the order of similarity of the image to the query, a higher score corresponds to a greater similarity(must be float value > 0.).

Match csv example
query	relevant	scores
1194917	601566 554492 224125 2001716519	4 3 2 2
1257924	456490	4

Example img_list csv: img_list.csv maps the id’s of query and relevant elements to image paths

Image csv example
id	image_path	label
1194917	data/img_1.jpg	0
601566	data/img_2.jpg	0
554492	data/img_3.jpg	0
224125	data/img_4.jpg	1
2001716519	data/img_5.jpg	1
1257924	data/img_6.jpg	1
456490	data/img_7.jpg	2

Gallery Image csv example
id	image_paths
8	data/db/img_1.jpg
10	data/db/img_2.jpg
12	data/db/img_3.jpg

__init__(data_folder: str, matches_csv_path: str, img_list_csv_path: str, transform: Union[BasicTransform, BaseCompose], augment: Optional[Union[BasicTransform, BaseCompose]] = None, gallery_folder: Optional[str] = '', gallery_list_csv_path: Optional[str] = None, use_query_without_relevants: bool = False, input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0)

Init RetrievalDataset class.

Parameters

data_folder – Directory with all the images.
matches_csv_path – path to csv file where queries with their relevance scores are specified
img_list_csv_path – path to mapping image identifiers to image paths. Format: id | path. ID from matches csv are linked to id from img_list csv
transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.
augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.
gallery_folder – Path to a folder with all gallery images (traversed recursively). When the gallery not specified all the remaining queries and relevant will be considered as negative samples to a given query-relevant set.
gallery_list_csv_path – Path to mapping image identifiers to image paths. Format: id | path.
use_query_without_relevants – If True, use query without relevants.
input_dtype – Data type of the torch tensors related to the image.
reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.
image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.
rgba_layout_color – color of the background during conversion from rgba.

Raises

ValueError – if gallery_folder True, but gallery_list_csv_path is None.

get_raw(idx: int) → dict

Get item sample.

Returns: image - np.array, representing image after augmentations, dtype=input_dtype. index - Index from DataFrame. query_idxs - Int tensor, if item is query: return index of this query in target matrix, else -1. scores - Float tensor shape (1, len(n_query)), relevant scores of current item. group_labels - Int tensor with image classification label.
Return type: dict with fields

__getitem__(index: int) → dict

Get item sample.

Returns: image - Tensor, representing image after augmentations and transformations, dtype=input_dtype. index - Index from DataFrame. query_idxs - Int tensor, if item is query: return index of this query in target matrix, else -1. scores - Float tensor shape (1, len(n_query)), relevant scores of current item. group_labels - Int tensor with image classification label.
Return type: dict with fields

Detection

class torchok.data.datasets.detection.detection.DetectionDataset(data_folder: Union[Path, str], annotation_path: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_column: str = 'image_path', input_dtype: str = 'float32', bbox_column: str = 'bbox', bbox_dtype: str = 'float32', target_column: str = 'label', target_dtype: str = 'long', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False, bbox_format: str = 'coco', min_area: float = 0.0, min_visibility: float = 0.0, filter_bboxes_on_start: bool = False)

Bases: ImageDataset

A dataset for image detection task.

Detection csv example.
image_path	bbox	label
image1.png	[[217.62	240.54	38.99	57.75]	[1.0	240.24	346.63	186.76]]	[0	1]
image2.png	[[102.49	118.47	7.9	17.31]]	[2	1]
image3.png	[[253.21	271.07	59.59	60.97]	[257.85	224.48	44.13	97.0]]	[2	0]

__init__(data_folder: Union[Path, str], annotation_path: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_column: str = 'image_path', input_dtype: str = 'float32', bbox_column: str = 'bbox', bbox_dtype: str = 'float32', target_column: str = 'label', target_dtype: str = 'long', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False, bbox_format: str = 'coco', min_area: float = 0.0, min_visibility: float = 0.0, filter_bboxes_on_start: bool = False)

Init DetectionDataset.

Parameters

data_folder – Directory with all the images.
annotation_path – Path to the .pkl or .csv file with image paths, bboxes and labels. Path to images must be under column image_path, bboxes must be under bbox column and bbox labels must be under label column. User can change column names, if the input_column, bbox_column or target_column is given.
transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.
augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.
input_column – Column name containing paths to the images.
input_dtype – Data type of the torch tensors related to the image.
bbox_column – Column name containing list of bboxes for every image.
bbox_dtype – Data type of the torch tensors related to the bboxes.
target_column – Column name containing bboxes labels.
target_dtype – Data type of the torch tensors related to the bboxes labels.
reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.
image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.
rgba_layout_color – color of the background during conversion from rgba.
test_mode – If True, only image without labels will be returned.
bbox_format – Bboxes format, for albumentations transform. Supports the following formats: pascal_voc - [x_min, y_min, x_max, y_max] = [98, 345, 420, 462] albumentations - [x_min, y_min, x_max, y_max] = [0.1531, 0.71875, 0.65625, 0.9625] coco - [x_min, y_min, width, height] = [98, 345, 322, 117] yolo - [x_center, y_center, width, height] = [0.4046875, 0.8614583, 0.503125, 0.24375]
min_area – Value in pixels If the area of a bounding box after augmentation becomes smaller than min_area, Albumentations will drop that box. So the returned list of augmented bounding boxes won’t contain that bounding box.
min_visibility – Value between 0 and 1. If the ratio of the bounding box area after augmentation to the area of the bounding box before augmentation becomes smaller than min_visibility, Albumentations will drop that box. So if the augmentation process cuts the most of the bounding box, that box won’t be present in the returned list of the augmented bounding boxes.
filter_bboxes_on_start – if True apply filter_bboxes function on the whole dataset at the init otherwise apply in get_raw

Raises

RuntimeError – if annotation_path is not in pkl or csv format.

filter_bboxes(bboxes: Tensor, labels: Tensor, rows: int, cols: int) → [<class 'torch.Tensor'>, <class 'torch.Tensor'>]

Filter empty bounding boxes.

Parameters

bboxes – List of bounding box.
labels – array of bbox labels
rows – Image height.
cols – Image width.

Returns

numpy array of bounding boxes and numpy array of labels of these boxes.

get_raw(idx: int) → dict

__getitem__(idx: int) → dict

Get item sample.

Returns: dict, where sample[‘image’] - Tensor, representing image after augmentations and transformations, dtype=input_dtype. sample[‘target’] - Target class or labels, dtype=target_dtype. sample[‘bboxes’] - Target bboxes, dtype=bbox_dtype. sample[‘index’] - Index of the sample, the same as input idx.
Return type: sample

collate_fn(batch): Puts each data field into a tensor with outer dimension batch size

Ready-to-go

class torchok.data.datasets.examples.coco_detection.COCODetection(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', target_dtype: str = 'long', bbox_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False, min_area: float = 0, min_visibility: float = 0.0)

Bases: DetectionDataset

A class represent detection COCO dataset https://cocodataset.org/#home.

The COCO Object Detection Task is designed to push the state of the art in object detection forward. COCO features two object detection tasks: using either bounding box output or object segmentation output (the latter is also known as instance segmentation).

COCO dataset has 81 categories where 0 - background label. Train set contains 118287 images, validation set - 5000.

This Dataset occupies 20 Gb of memory.

CLASSES = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']

label_mapping = {1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 13: 12, 14: 13, 15: 14, 16: 15, 17: 16, 18: 17, 19: 18, 20: 19, 21: 20, 22: 21, 23: 22, 24: 23, 25: 24, 27: 25, 28: 26, 31: 27, 32: 28, 33: 29, 34: 30, 35: 31, 36: 32, 37: 33, 38: 34, 39: 35, 40: 36, 41: 37, 42: 38, 43: 39, 44: 40, 46: 41, 47: 42, 48: 43, 49: 44, 50: 45, 51: 46, 52: 47, 53: 48, 54: 49, 55: 50, 56: 51, 57: 52, 58: 53, 59: 54, 60: 55, 61: 56, 62: 57, 63: 58, 64: 59, 65: 60, 67: 61, 70: 62, 72: 63, 73: 64, 74: 65, 75: 66, 76: 67, 77: 68, 78: 69, 79: 70, 80: 71, 81: 72, 82: 73, 84: 74, 85: 75, 86: 76, 87: 77, 88: 78, 89: 79, 90: 80}

base_folder = 'COCO'

train_data_filename = 'train2017.zip'

train_data_url = 'http://images.cocodataset.org/zips/train2017.zip'

train_data_hash = 'cced6f7f71b7629ddf16f17bbcfab6b2'

valid_data_filename = 'valid2017.zip'

valid_data_url = 'http://images.cocodataset.org/zips/val2017.zip'

valid_data_hash = '442b8da7639aecaf257c1dceb8ba8c80'

annotations_filename = 'annotations.zip'

annotations_url = 'http://images.cocodataset.org/annotations/annotations_trainval2017.zip'

annotations_hash = 'f4bbac642086de4f52a3fdda2de5fa2c'

train_pkl = 'train_detection.pkl'

valid_pkl = 'valid_detection.pkl'

__init__(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', target_dtype: str = 'long', bbox_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False, min_area: float = 0, min_visibility: float = 0.0)

Init SweetPepper.

Parameters

train – If True, train dataset will be used, else - test dataset.
download – If True, data will be downloaded and save to data_folder.
data_folder – Directory with all the images.
transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.
augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.
input_dtype – Data type of the torch tensors related to the image.
target_dtype – Data type of the torch tensors related to the bboxes labels.
bbox_dtype – Data type of the torch tensors related to the bboxes.
reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.
image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.
rgba_layout_color – color of the background during conversion from rgba.
test_mode – If True, only image without labels will be returned.
min_area – Value in pixels If the area of a bounding box after augmentation becomes smaller than min_area, Albumentations will drop that box. So the returned list of augmented bounding boxes won’t contain that bounding box.
min_visibility – Value between 0 and 1. If the ratio of the bounding box area after augmentation to the area of the bounding box before augmentation becomes smaller than min_visibility, Albumentations will drop that box. So if the augmentation process cuts the most of the bounding box, that box won’t be present in the returned list of the augmented bounding boxes.

create_annotation(json_path: Union[Path, str], image_folder: Union[Path, str], save_df_path: Union[Path, str])

Create train-valid csv for loaded COCO dataset.

Parameters

json_path – COCO json annotation file path.
image_folder – COCO images folder.
save_df_path – Pickle save name.

__getitem__(idx: int) → dict

Get item sample.

Returns: dict, where sample[‘image’] - Tensor, representing image after augmentations and transformations, dtype=input_dtype. sample[‘target’] - Target class or labels, dtype=target_dtype. sample[‘bboxes’] - Target bboxes, dtype=bbox_dtype. sample[‘index’] - Index of the sample, the same as input idx.
Return type: sample

class torchok.data.datasets.examples.coco_segmentation.COCOSegmentation(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', target_dtype: str = 'long', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)

Bases: ImageSegmentationDataset

A class represent detection COCO dataset https://cocodataset.org/#home.

The COCO Object Detection Task is designed to push the state of the art in object detection forward. COCO features two object detection tasks: using either bounding box output or object segmentation output (the latter is also known as instance segmentation).

COCO dataset has 81 categories where 0 - background label. Train set contains 118287 images, validation set - 5000.

This Dataset occupies 21 Gb of memory.

CLASSES = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']

label_mapping = {1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 13: 12, 14: 13, 15: 14, 16: 15, 17: 16, 18: 17, 19: 18, 20: 19, 21: 20, 22: 21, 23: 22, 24: 23, 25: 24, 27: 25, 28: 26, 31: 27, 32: 28, 33: 29, 34: 30, 35: 31, 36: 32, 37: 33, 38: 34, 39: 35, 40: 36, 41: 37, 42: 38, 43: 39, 44: 40, 46: 41, 47: 42, 48: 43, 49: 44, 50: 45, 51: 46, 52: 47, 53: 48, 54: 49, 55: 50, 56: 51, 57: 52, 58: 53, 59: 54, 60: 55, 61: 56, 62: 57, 63: 58, 64: 59, 65: 60, 67: 61, 70: 62, 72: 63, 73: 64, 74: 65, 75: 66, 76: 67, 77: 68, 78: 69, 79: 70, 80: 71, 81: 72, 82: 73, 84: 74, 85: 75, 86: 76, 87: 77, 88: 78, 89: 79, 90: 80}

base_folder = 'COCO'

train_data_filename = 'train2017.zip'

train_data_url = 'http://images.cocodataset.org/zips/train2017.zip'

train_data_hash = 'cced6f7f71b7629ddf16f17bbcfab6b2'

valid_data_filename = 'valid2017.zip'

valid_data_url = 'http://images.cocodataset.org/zips/val2017.zip'

valid_data_hash = '442b8da7639aecaf257c1dceb8ba8c80'

annotations_filename = 'annotations.zip'

annotations_url = 'http://images.cocodataset.org/annotations/annotations_trainval2017.zip'

annotations_hash = 'f4bbac642086de4f52a3fdda2de5fa2c'

train_csv = 'train_segmentation.csv'

valid_csv = 'valid_segmentation.csv'

__init__(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', target_dtype: str = 'long', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)

Init SweetPepper.

Parameters

train – If True, train dataset will be used, else - test dataset.
download – If True, data will be downloaded and save to data_folder.
data_folder – Directory with all the images.
transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.
augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.
input_dtype – Data type of the torch tensors related to the image.
target_dtype – Data type of the torch tensors related to the target mask.
reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.
image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.
rgba_layout_color – color of the background during conversion from rgba.
test_mode – If True, only image without labels will be returned.

create_annotation(json_path: Union[str, Path], mask_folder: Union[str, Path], save_df_path: Union[str, Path])

Create train-valid csv for loaded COCO dataset.

Parameters

json_path – COCO json annotation file path.
mask_folder – COCO mask save folder.
save_df_path – Pickle save name.

class torchok.data.datasets.examples.sop.SOP(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)

Bases: ImageDataset

A class represent Stanford Online Products - SOP dataset.

Additionally, we collected Stanford Online Products dataset: 120k images of 23k classes of online products for metric learning. The homepage of SOP is https://cvgl.stanford.edu/projects/lifted_struct/.

base_folder = 'Stanford_Online_Products'

filename = 'Stanford_Online_Products.tar.gz'

url = 'https://torchok-hub.s3.eu-west-1.amazonaws.com/Stanford_Online_Products.tar.gz'

tgz_md5 = 'b96128cf2b75493708511ff5c400eefe'

train_txt = 'Ebay_train.txt'

test_txt = 'Ebay_test.txt'

__init__(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)

Init SOP.

Have 120,053 images with 22,634 classes in the dataset in total. Train have 59551 images with 11318 classes. Test have 60502 images with 11316 classes.

Parameters

train – If True, train dataset will be used, else - test dataset.
download – If True, data will be downloaded and save to data_folder.
data_folder – Directory with all the images.
transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.
augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.
input_dtype – Data type of the torch tensors related to the image.
reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.
image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.
rgba_layout_color – color of the background during conversion from rgba.
test_mode – If True, only image without labels will be returned.

get_raw(idx: int) → dict

Get item sample.

Returns: dict, where sample[‘image’] - Tensor, representing image after augmentations. sample[‘target’] - Target class or labels. sample[‘index’] - Index of the sample, the same as input idx.
Return type: sample

__getitem__(idx: int) → dict

Get item sample.

Returns: dict, where sample[‘image’] - Tensor, representing image after augmentations and transformations, dtype=input_dtype. sample[‘target’] - Target class or labels. sample[‘index’] - Index of the sample, the same as input idx.
Return type: sample

class torchok.data.datasets.examples.sweet_pepper.SweetPepper(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', target_dtype: str = 'int64', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)

Bases: ImageSegmentationDataset

A class represent segmentation dataset Sweet Pepper from Kaggle https://www.kaggle.com/datasets/lemontyc/sweet-pepper.

The main task for this dataset is segment peppers (fruit) and peduncle on the images, obtained from different farm locations. Dataset has 3 labels: 0 - background, 1 - fruit and 2 - peduncle. Dataset contain 620 images in HD resolution, 500 - for train and 120 for validate.

base_folder = 'sweet_pepper'

filename = 'sweet_pepper.tar.gz'

url = 'https://torchok-hub.s3.eu-west-1.amazonaws.com/sweet_pepper.tar.gz'

tgz_md5 = '65021e5fad5fe286b3c2bac7753d6e9d'

train_csv = 'train.csv'

valid_csv = 'valid.csv'

__init__(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', target_dtype: str = 'int64', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)

Init SweetPepper.

Parameters

train – If True, train dataset will be used, else - test dataset.
download – If True, data will be downloaded and save to data_folder.
data_folder – Directory with all the images.
transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.
augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.
input_dtype – Data type of the torch tensors related to the image.
target_dtype – Data type of the torch tensors related to the target.
reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.
image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.
rgba_layout_color – color of the background during conversion from rgba.
test_mode – If True, only image without labels will be returned.

class torchok.data.datasets.examples.triplet_sop.TRIPLET_SOP(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, anchor_column: str = 'anchor', positive_column: str = 'positive', negative_column: str = 'negative', input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)

Bases: ImageDataset

A class represent Stanford Online Products - SOP dataset.

Additionally, we collected Stanford Online Products dataset: 120k images of 23k classes of online products for metric learning. The homepage of SOP is https://cvgl.stanford.edu/projects/lifted_struct/.

base_folder = 'Stanford_Online_Products'

filename = 'Stanford_Online_Products.tar.gz'

url = 'https://torchok-hub.s3.eu-west-1.amazonaws.com/Stanford_Online_Products.tar.gz'

tgz_md5 = 'b96128cf2b75493708511ff5c400eefe'

train_csv = 'sop_triplet_train.csv'

test_csv = 'sop_triplet_test.csv'

__init__(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, anchor_column: str = 'anchor', positive_column: str = 'positive', negative_column: str = 'negative', input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)

Init TRIPLET SOP.

Dataset have 11319 image pair(anchor, positive, negative).

Parameters

download – If True, data will be downloaded and save to data_folder.
data_folder – Directory with all the images.
transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.
augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.
input_dtype – Data type of the torch tensors related to the image.
reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.
image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.
rgba_layout_color – color of the background during conversion from rgba.
test_mode – If True, only image without labels will be returned.

__getitem__(idx: int) → dict

Get item sample.

Returns: dict, where sample[‘anchor’] - Anchor. sample[‘positive’] - Positive. sample[‘negative’] - Negative. sample[‘index’] - Index of the sample, the same as input idx.
Return type: sample

get_raw(idx: int) → dict

Get item sample.

Returns: dict, where sample[‘image’] - Tensor, representing image after augmentations. sample[‘target’] - Target class or labels. sample[‘index’] - Index of the sample, the same as input idx.
Return type: sample