Datasets

Interface

Each dataset in TorchOk has to be inherited from a single interface ImageDataset. There are a few methods that need to be implemented. Follow the general principles when you implement your dataset:

  • Constructor

def __init__(self,
            transform: Optional[Union[BasicTransform, BaseCompose]],
            augment: Optional[Union[BasicTransform, BaseCompose]] = None,
            input_dtype: str = 'float32',
            image_format: str = 'rgb',
            rgba_layout_color: Union[int, Tuple[int, int, int]] = 0,
            test_mode: bool = False):
    # Use transforms and augments for two different purposes: augmentations should be applied to get a randomly
    # manipulated image version while transformations are used to get a fixed transformation of each input image
    # to be able to pass it to the neural network model (like resizing, normalization and to-tensor conversion)
  • Length of the dataset

def __len__(self) -> int:
    # Return total expected length of the dataset
  • Getting access to a raw item of the dataset

def get_raw(self, idx: int) -> dict:
    # Read a sample from disk or whatever your dataset is using. You can utilize self._read_image(image_path) call.
    # Use augmentations on numpy images here
    # Return a dictionary with string keys and tensor values. Usually, images are returned as numpy arrays before
    # normalization here, so that a user can directly call this method to get understanding on how an output image
    # looks like
  • Getting access to a tensor item of the dataset

def __getitem__(self, idx: int) -> dict:
    # Usually, a self.get_raw(idx) is called here.
    # Then you should use transformations to transform numpy images and other samples to PyTorch tensors

Classification

class torchok.data.datasets.classification.classification.ImageClassificationDataset(data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, annotation_path: Optional[str] = None, num_classes: Optional[int] = None, input_column: str = 'image_path', input_dtype: str = 'float32', target_column: str = 'label', target_dtype: str = 'long', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False, multilabel: bool = False, lazy_init: bool = False, csv_path: Optional[str] = None)

Bases: ImageDataset

A generic dataset for multilabel/multiclass image classification task.

Multiclass task csv example.

image_path

label

cat_1.jpg

1

dog_1.jpg

0

Multilabel task csv example.

image_path

label

cat_dog_1.jpg

0 1

cat_dog_2.jpg

0 1

dog_1.jpg

0

__init__(data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, annotation_path: Optional[str] = None, num_classes: Optional[int] = None, input_column: str = 'image_path', input_dtype: str = 'float32', target_column: str = 'label', target_dtype: str = 'long', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False, multilabel: bool = False, lazy_init: bool = False, csv_path: Optional[str] = None)

Init ImageClassificationDataset.

Parameters
  • data_folder – Directory with all the images.

  • annotation_path – Path to the .pkl or .csv file with path to images and annotations. Path to images must be under column input_column and annotations must be under target_column column.

  • transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.

  • augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.

  • num_classes – Number of classes (i.e. maximum class index in the dataset).

  • input_column – column name containing paths to the images.

  • input_dtype – Data type of the torch tensors related to the image.

  • target_column – column name containing image label.

  • target_dtype – Data type of the torch tensors related to the target.

  • reader_library – Image reading library. Can be ‘opencv’ or ‘pillow’.

  • image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.

  • rgba_layout_color – color of the background during conversion from rgba.

  • test_mode – If True, only image without labels will be returned.

  • multilabel – If True, targets are being converted to multihot vector for multilabel task. If False, dataset prepares targets for multiclass classification.

  • lazy_init – If True, for multilabel the target variable is converted to multihot when __getitem__ is called. For multiclass will check the class index to fit the range when __getitem__ is called.

  • csv_path – DEPRECATED, Path to the .pkl or .csv file with path to images and annotations. Path to images must be under column input_column and annotations must be under target_column column.

get_raw(idx: int) dict

Get item sample without transform application.

Returns

dict, where sample[‘image’] - np.array, representing image after augmentations. sample[‘target’] - Target class or labels. sample[‘index’] - Index of the sample, the same as input idx.

Return type

sample

__getitem__(idx: int) dict

Get item sample.

Returns

dict, where sample[‘image’] - Tensor, representing image after augmentations and transformations, dtype=input_dtype. sample[‘target’] - Target class or labels, dtype=target_dtype. sample[‘index’] - Index of the sample, the same as input idx.

Return type

sample

process_function(target: Any) Any

Prepare dataset target based of classification type.

Parameters

target – Classification labels to prepare.

Returns

Prepared classification labels.

Segmentation

class torchok.data.datasets.segmentation.image_segmentation.ImageSegmentationDataset(data_folder: Union[Path, str], annotation_path: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_column: str = 'image_path', input_dtype: str = 'float32', target_column: str = 'mask_path', target_dtype: str = 'int64', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)

Bases: ImageDataset

A dataset for image segmentation task.

Segmentation csv example.

image_path

mask

image1.png

mask1.png

image2.png

mask2.png

image3.png

mask3.png

__init__(data_folder: Union[Path, str], annotation_path: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_column: str = 'image_path', input_dtype: str = 'float32', target_column: str = 'mask_path', target_dtype: str = 'int64', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)

Init ImageSegmentationDataset.

Parameters
  • data_folder – Directory with all the images.

  • annotation_path – Path to the .pkl or .csv file with path to images and masks. Path to images must be under column image_path and annotations must be under mask column. User can change column names, if the csv_columns_mapping is given.

  • transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.

  • augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.

  • input_column – column name containing paths to the images.

  • input_dtype – Data type of the torch tensors related to the image.

  • target_dtype – Data type of the torch tensors related to the target.

  • reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.

  • image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.

  • rgba_layout_color – color of the background during conversion from rgba.

  • test_mode – If True, only image without labels will be returned.

get_raw(idx: int) dict
__getitem__(idx: int) Dict[str, Any]

Representation

class torchok.data.datasets.representation.unsupervised_contrastive_dataset.UnsupervisedContrastiveDataset(data_folder: str, transform: Union[BasicTransform, BaseCompose], augment: Optional[Union[BasicTransform, BaseCompose]] = None, annotation_path: Optional[str] = None, input_column: str = 'image_path', input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, csv_path: Optional[str] = None)

Bases: ImageDataset

A dataset for unsupervised contrastive task.

One image is transformed twice so that they are positive to each other.

UnsupervisedContrastive csv example

image_path

cat_1.jpg

dog_1.jpg

__init__(data_folder: str, transform: Union[BasicTransform, BaseCompose], augment: Optional[Union[BasicTransform, BaseCompose]] = None, annotation_path: Optional[str] = None, input_column: str = 'image_path', input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, csv_path: Optional[str] = None)

Init UnsupervisedContrastiveDataset.

Parameters
  • data_folder – Directory with all the images.

  • annotation_path – Path to the .pkl or .csv file with path to images and annotations. Path to images must be under column input_column.

  • transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.

  • augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.

  • input_column – column name containing paths to the images.

  • input_dtype – data type of the torch tensors related to the image.

  • reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.

  • image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.

  • rgba_layout_color – color of the background during conversion from rgba.

  • csv_path – DEPRECATED, Path to the .pkl or .csv file with path to images and annotations. Path to images must be under column input_column.

get_raw(idx: int) dict

Get item sample.

Returns

dict, where sample[‘image_0’] - Tensor, representing image after augmentations. sample[‘image_1’] - Tensor, representing image after augmentations. sample[‘index’] - Index of the sample, the same as input idx.

Return type

sample

__getitem__(idx: int) dict

Get item sample.

Returns

dict, where sample[‘image_0’] - Tensor, representing image after augmentations and transformations, dtype=input_dtype. sample[‘image_1’] - Tensor, representing image after augmentations and transformations, dtype=input_dtype. sample[‘index’] - Index of the sample, the same as input idx.

Return type

sample

class torchok.data.datasets.representation.validation.RetrievalDataset(data_folder: str, matches_csv_path: str, img_list_csv_path: str, transform: Union[BasicTransform, BaseCompose], augment: Optional[Union[BasicTransform, BaseCompose]] = None, gallery_folder: Optional[str] = '', gallery_list_csv_path: Optional[str] = None, use_query_without_relevants: bool = False, input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0)

Bases: ImageDataset

Dataset for image retrieval validation.

The searches are made by queries while looking for relevant items in the whole set of items. Where gallery items are treated non-relevant.

Example matches csv: Query ids should be unique int values, otherwise the rows having the same query id will be treated as different matches.

Relevant ids can be repeated in different queries.

Scores reflect the order of similarity of the image to the query, a higher score corresponds to a greater similarity(must be float value > 0.).

Match csv example

query

relevant

scores

1194917

601566 554492 224125 2001716519

4 3 2 2

1257924

456490

4

Example img_list csv: img_list.csv maps the id’s of query and relevant elements to image paths

Image csv example

id

image_path

label

1194917

data/img_1.jpg

0

601566

data/img_2.jpg

0

554492

data/img_3.jpg

0

224125

data/img_4.jpg

1

2001716519

data/img_5.jpg

1

1257924

data/img_6.jpg

1

456490

data/img_7.jpg

2

Gallery Image csv example

id

image_paths

8

data/db/img_1.jpg

10

data/db/img_2.jpg

12

data/db/img_3.jpg

__init__(data_folder: str, matches_csv_path: str, img_list_csv_path: str, transform: Union[BasicTransform, BaseCompose], augment: Optional[Union[BasicTransform, BaseCompose]] = None, gallery_folder: Optional[str] = '', gallery_list_csv_path: Optional[str] = None, use_query_without_relevants: bool = False, input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0)

Init RetrievalDataset class.

Parameters
  • data_folder – Directory with all the images.

  • matches_csv_path – path to csv file where queries with their relevance scores are specified

  • img_list_csv_path – path to mapping image identifiers to image paths. Format: id | path. ID from matches csv are linked to id from img_list csv

  • transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.

  • augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.

  • gallery_folder – Path to a folder with all gallery images (traversed recursively). When the gallery not specified all the remaining queries and relevant will be considered as negative samples to a given query-relevant set.

  • gallery_list_csv_path – Path to mapping image identifiers to image paths. Format: id | path.

  • use_query_without_relevants – If True, use query without relevants.

  • input_dtype – Data type of the torch tensors related to the image.

  • reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.

  • image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.

  • rgba_layout_color – color of the background during conversion from rgba.

Raises

ValueError – if gallery_folder True, but gallery_list_csv_path is None.

get_raw(idx: int) dict

Get item sample.

Returns

image - np.array, representing image after augmentations, dtype=input_dtype. index - Index from DataFrame. query_idxs - Int tensor, if item is query: return index of this query in target matrix, else -1. scores - Float tensor shape (1, len(n_query)), relevant scores of current item. group_labels - Int tensor with image classification label.

Return type

dict with fields

__getitem__(index: int) dict

Get item sample.

Returns

image - Tensor, representing image after augmentations and transformations, dtype=input_dtype. index - Index from DataFrame. query_idxs - Int tensor, if item is query: return index of this query in target matrix, else -1. scores - Float tensor shape (1, len(n_query)), relevant scores of current item. group_labels - Int tensor with image classification label.

Return type

dict with fields

Detection

class torchok.data.datasets.detection.detection.DetectionDataset(data_folder: Union[Path, str], annotation_path: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_column: str = 'image_path', input_dtype: str = 'float32', bbox_column: str = 'bbox', bbox_dtype: str = 'float32', target_column: str = 'label', target_dtype: str = 'long', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False, bbox_format: str = 'coco', min_area: float = 0.0, min_visibility: float = 0.0, filter_bboxes_on_start: bool = False)

Bases: ImageDataset

A dataset for image detection task.

Detection csv example.

image_path

bbox

label

image1.png

[[217.62

240.54

38.99

57.75]

[1.0

240.24

346.63

186.76]]

[0

1]

image2.png

[[102.49

118.47

7.9

17.31]]

[2

1]

image3.png

[[253.21

271.07

59.59

60.97]

[257.85

224.48

44.13

97.0]]

[2

0]

__init__(data_folder: Union[Path, str], annotation_path: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_column: str = 'image_path', input_dtype: str = 'float32', bbox_column: str = 'bbox', bbox_dtype: str = 'float32', target_column: str = 'label', target_dtype: str = 'long', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False, bbox_format: str = 'coco', min_area: float = 0.0, min_visibility: float = 0.0, filter_bboxes_on_start: bool = False)

Init DetectionDataset.

Parameters
  • data_folder – Directory with all the images.

  • annotation_path – Path to the .pkl or .csv file with image paths, bboxes and labels. Path to images must be under column image_path, bboxes must be under bbox column and bbox labels must be under label column. User can change column names, if the input_column, bbox_column or target_column is given.

  • transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.

  • augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.

  • input_column – Column name containing paths to the images.

  • input_dtype – Data type of the torch tensors related to the image.

  • bbox_column – Column name containing list of bboxes for every image.

  • bbox_dtype – Data type of the torch tensors related to the bboxes.

  • target_column – Column name containing bboxes labels.

  • target_dtype – Data type of the torch tensors related to the bboxes labels.

  • reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.

  • image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.

  • rgba_layout_color – color of the background during conversion from rgba.

  • test_mode – If True, only image without labels will be returned.

  • bbox_format – Bboxes format, for albumentations transform. Supports the following formats: pascal_voc - [x_min, y_min, x_max, y_max] = [98, 345, 420, 462] albumentations - [x_min, y_min, x_max, y_max] = [0.1531, 0.71875, 0.65625, 0.9625] coco - [x_min, y_min, width, height] = [98, 345, 322, 117] yolo - [x_center, y_center, width, height] = [0.4046875, 0.8614583, 0.503125, 0.24375]

  • min_area – Value in pixels If the area of a bounding box after augmentation becomes smaller than min_area, Albumentations will drop that box. So the returned list of augmented bounding boxes won’t contain that bounding box.

  • min_visibility – Value between 0 and 1. If the ratio of the bounding box area after augmentation to the area of the bounding box before augmentation becomes smaller than min_visibility, Albumentations will drop that box. So if the augmentation process cuts the most of the bounding box, that box won’t be present in the returned list of the augmented bounding boxes.

  • filter_bboxes_on_start – if True apply filter_bboxes function on the whole dataset at the init otherwise apply in get_raw

Raises

RuntimeError – if annotation_path is not in pkl or csv format.

filter_bboxes(bboxes: Tensor, labels: Tensor, rows: int, cols: int) [<class 'torch.Tensor'>, <class 'torch.Tensor'>]

Filter empty bounding boxes.

Parameters
  • bboxes – List of bounding box.

  • labels – array of bbox labels

  • rows – Image height.

  • cols – Image width.

Returns

numpy array of bounding boxes and numpy array of labels of these boxes.

get_raw(idx: int) dict
__getitem__(idx: int) dict

Get item sample.

Returns

dict, where sample[‘image’] - Tensor, representing image after augmentations and transformations, dtype=input_dtype. sample[‘target’] - Target class or labels, dtype=target_dtype. sample[‘bboxes’] - Target bboxes, dtype=bbox_dtype. sample[‘index’] - Index of the sample, the same as input idx.

Return type

sample

collate_fn(batch)

Puts each data field into a tensor with outer dimension batch size

Ready-to-go

class torchok.data.datasets.examples.coco_detection.COCODetection(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', target_dtype: str = 'long', bbox_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False, min_area: float = 0, min_visibility: float = 0.0)

Bases: DetectionDataset

A class represent detection COCO dataset https://cocodataset.org/#home.

The COCO Object Detection Task is designed to push the state of the art in object detection forward. COCO features two object detection tasks: using either bounding box output or object segmentation output (the latter is also known as instance segmentation).

COCO dataset has 81 categories where 0 - background label. Train set contains 118287 images, validation set - 5000.

This Dataset occupies 20 Gb of memory.

CLASSES = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']
label_mapping = {1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 13: 12, 14: 13, 15: 14, 16: 15, 17: 16, 18: 17, 19: 18, 20: 19, 21: 20, 22: 21, 23: 22, 24: 23, 25: 24, 27: 25, 28: 26, 31: 27, 32: 28, 33: 29, 34: 30, 35: 31, 36: 32, 37: 33, 38: 34, 39: 35, 40: 36, 41: 37, 42: 38, 43: 39, 44: 40, 46: 41, 47: 42, 48: 43, 49: 44, 50: 45, 51: 46, 52: 47, 53: 48, 54: 49, 55: 50, 56: 51, 57: 52, 58: 53, 59: 54, 60: 55, 61: 56, 62: 57, 63: 58, 64: 59, 65: 60, 67: 61, 70: 62, 72: 63, 73: 64, 74: 65, 75: 66, 76: 67, 77: 68, 78: 69, 79: 70, 80: 71, 81: 72, 82: 73, 84: 74, 85: 75, 86: 76, 87: 77, 88: 78, 89: 79, 90: 80}
base_folder = 'COCO'
train_data_filename = 'train2017.zip'
train_data_url = 'http://images.cocodataset.org/zips/train2017.zip'
train_data_hash = 'cced6f7f71b7629ddf16f17bbcfab6b2'
valid_data_filename = 'valid2017.zip'
valid_data_url = 'http://images.cocodataset.org/zips/val2017.zip'
valid_data_hash = '442b8da7639aecaf257c1dceb8ba8c80'
annotations_filename = 'annotations.zip'
annotations_url = 'http://images.cocodataset.org/annotations/annotations_trainval2017.zip'
annotations_hash = 'f4bbac642086de4f52a3fdda2de5fa2c'
train_pkl = 'train_detection.pkl'
valid_pkl = 'valid_detection.pkl'
__init__(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', target_dtype: str = 'long', bbox_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False, min_area: float = 0, min_visibility: float = 0.0)

Init SweetPepper.

Parameters
  • train – If True, train dataset will be used, else - test dataset.

  • download – If True, data will be downloaded and save to data_folder.

  • data_folder – Directory with all the images.

  • transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.

  • augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.

  • input_dtype – Data type of the torch tensors related to the image.

  • target_dtype – Data type of the torch tensors related to the bboxes labels.

  • bbox_dtype – Data type of the torch tensors related to the bboxes.

  • reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.

  • image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.

  • rgba_layout_color – color of the background during conversion from rgba.

  • test_mode – If True, only image without labels will be returned.

  • min_area – Value in pixels If the area of a bounding box after augmentation becomes smaller than min_area, Albumentations will drop that box. So the returned list of augmented bounding boxes won’t contain that bounding box.

  • min_visibility – Value between 0 and 1. If the ratio of the bounding box area after augmentation to the area of the bounding box before augmentation becomes smaller than min_visibility, Albumentations will drop that box. So if the augmentation process cuts the most of the bounding box, that box won’t be present in the returned list of the augmented bounding boxes.

create_annotation(json_path: Union[Path, str], image_folder: Union[Path, str], save_df_path: Union[Path, str])

Create train-valid csv for loaded COCO dataset.

Parameters
  • json_path – COCO json annotation file path.

  • image_folder – COCO images folder.

  • save_df_path – Pickle save name.

__getitem__(idx: int) dict

Get item sample.

Returns

dict, where sample[‘image’] - Tensor, representing image after augmentations and transformations, dtype=input_dtype. sample[‘target’] - Target class or labels, dtype=target_dtype. sample[‘bboxes’] - Target bboxes, dtype=bbox_dtype. sample[‘index’] - Index of the sample, the same as input idx.

Return type

sample

class torchok.data.datasets.examples.coco_segmentation.COCOSegmentation(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', target_dtype: str = 'long', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)

Bases: ImageSegmentationDataset

A class represent detection COCO dataset https://cocodataset.org/#home.

The COCO Object Detection Task is designed to push the state of the art in object detection forward. COCO features two object detection tasks: using either bounding box output or object segmentation output (the latter is also known as instance segmentation).

COCO dataset has 81 categories where 0 - background label. Train set contains 118287 images, validation set - 5000.

This Dataset occupies 21 Gb of memory.

CLASSES = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']
label_mapping = {1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 13: 12, 14: 13, 15: 14, 16: 15, 17: 16, 18: 17, 19: 18, 20: 19, 21: 20, 22: 21, 23: 22, 24: 23, 25: 24, 27: 25, 28: 26, 31: 27, 32: 28, 33: 29, 34: 30, 35: 31, 36: 32, 37: 33, 38: 34, 39: 35, 40: 36, 41: 37, 42: 38, 43: 39, 44: 40, 46: 41, 47: 42, 48: 43, 49: 44, 50: 45, 51: 46, 52: 47, 53: 48, 54: 49, 55: 50, 56: 51, 57: 52, 58: 53, 59: 54, 60: 55, 61: 56, 62: 57, 63: 58, 64: 59, 65: 60, 67: 61, 70: 62, 72: 63, 73: 64, 74: 65, 75: 66, 76: 67, 77: 68, 78: 69, 79: 70, 80: 71, 81: 72, 82: 73, 84: 74, 85: 75, 86: 76, 87: 77, 88: 78, 89: 79, 90: 80}
base_folder = 'COCO'
train_data_filename = 'train2017.zip'
train_data_url = 'http://images.cocodataset.org/zips/train2017.zip'
train_data_hash = 'cced6f7f71b7629ddf16f17bbcfab6b2'
valid_data_filename = 'valid2017.zip'
valid_data_url = 'http://images.cocodataset.org/zips/val2017.zip'
valid_data_hash = '442b8da7639aecaf257c1dceb8ba8c80'
annotations_filename = 'annotations.zip'
annotations_url = 'http://images.cocodataset.org/annotations/annotations_trainval2017.zip'
annotations_hash = 'f4bbac642086de4f52a3fdda2de5fa2c'
train_csv = 'train_segmentation.csv'
valid_csv = 'valid_segmentation.csv'
__init__(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', target_dtype: str = 'long', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)

Init SweetPepper.

Parameters
  • train – If True, train dataset will be used, else - test dataset.

  • download – If True, data will be downloaded and save to data_folder.

  • data_folder – Directory with all the images.

  • transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.

  • augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.

  • input_dtype – Data type of the torch tensors related to the image.

  • target_dtype – Data type of the torch tensors related to the target mask.

  • reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.

  • image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.

  • rgba_layout_color – color of the background during conversion from rgba.

  • test_mode – If True, only image without labels will be returned.

create_annotation(json_path: Union[str, Path], mask_folder: Union[str, Path], save_df_path: Union[str, Path])

Create train-valid csv for loaded COCO dataset.

Parameters
  • json_path – COCO json annotation file path.

  • mask_folder – COCO mask save folder.

  • save_df_path – Pickle save name.

class torchok.data.datasets.examples.sop.SOP(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)

Bases: ImageDataset

A class represent Stanford Online Products - SOP dataset.

Additionally, we collected Stanford Online Products dataset: 120k images of 23k classes of online products for metric learning. The homepage of SOP is https://cvgl.stanford.edu/projects/lifted_struct/.

base_folder = 'Stanford_Online_Products'
filename = 'Stanford_Online_Products.tar.gz'
url = 'https://torchok-hub.s3.eu-west-1.amazonaws.com/Stanford_Online_Products.tar.gz'
tgz_md5 = 'b96128cf2b75493708511ff5c400eefe'
train_txt = 'Ebay_train.txt'
test_txt = 'Ebay_test.txt'
__init__(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)

Init SOP.

Have 120,053 images with 22,634 classes in the dataset in total. Train have 59551 images with 11318 classes. Test have 60502 images with 11316 classes.

Parameters
  • train – If True, train dataset will be used, else - test dataset.

  • download – If True, data will be downloaded and save to data_folder.

  • data_folder – Directory with all the images.

  • transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.

  • augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.

  • input_dtype – Data type of the torch tensors related to the image.

  • reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.

  • image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.

  • rgba_layout_color – color of the background during conversion from rgba.

  • test_mode – If True, only image without labels will be returned.

get_raw(idx: int) dict

Get item sample.

Returns

dict, where sample[‘image’] - Tensor, representing image after augmentations. sample[‘target’] - Target class or labels. sample[‘index’] - Index of the sample, the same as input idx.

Return type

sample

__getitem__(idx: int) dict

Get item sample.

Returns

dict, where sample[‘image’] - Tensor, representing image after augmentations and transformations, dtype=input_dtype. sample[‘target’] - Target class or labels. sample[‘index’] - Index of the sample, the same as input idx.

Return type

sample

class torchok.data.datasets.examples.sweet_pepper.SweetPepper(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', target_dtype: str = 'int64', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)

Bases: ImageSegmentationDataset

A class represent segmentation dataset Sweet Pepper from Kaggle https://www.kaggle.com/datasets/lemontyc/sweet-pepper.

The main task for this dataset is segment peppers (fruit) and peduncle on the images, obtained from different farm locations. Dataset has 3 labels: 0 - background, 1 - fruit and 2 - peduncle. Dataset contain 620 images in HD resolution, 500 - for train and 120 for validate.

base_folder = 'sweet_pepper'
filename = 'sweet_pepper.tar.gz'
url = 'https://torchok-hub.s3.eu-west-1.amazonaws.com/sweet_pepper.tar.gz'
tgz_md5 = '65021e5fad5fe286b3c2bac7753d6e9d'
train_csv = 'train.csv'
valid_csv = 'valid.csv'
__init__(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', target_dtype: str = 'int64', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)

Init SweetPepper.

Parameters
  • train – If True, train dataset will be used, else - test dataset.

  • download – If True, data will be downloaded and save to data_folder.

  • data_folder – Directory with all the images.

  • transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.

  • augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.

  • input_dtype – Data type of the torch tensors related to the image.

  • target_dtype – Data type of the torch tensors related to the target.

  • reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.

  • image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.

  • rgba_layout_color – color of the background during conversion from rgba.

  • test_mode – If True, only image without labels will be returned.

class torchok.data.datasets.examples.triplet_sop.TRIPLET_SOP(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, anchor_column: str = 'anchor', positive_column: str = 'positive', negative_column: str = 'negative', input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)

Bases: ImageDataset

A class represent Stanford Online Products - SOP dataset.

Additionally, we collected Stanford Online Products dataset: 120k images of 23k classes of online products for metric learning. The homepage of SOP is https://cvgl.stanford.edu/projects/lifted_struct/.

base_folder = 'Stanford_Online_Products'
filename = 'Stanford_Online_Products.tar.gz'
url = 'https://torchok-hub.s3.eu-west-1.amazonaws.com/Stanford_Online_Products.tar.gz'
tgz_md5 = 'b96128cf2b75493708511ff5c400eefe'
train_csv = 'sop_triplet_train.csv'
test_csv = 'sop_triplet_test.csv'
__init__(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, anchor_column: str = 'anchor', positive_column: str = 'positive', negative_column: str = 'negative', input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)

Init TRIPLET SOP.

Dataset have 11319 image pair(anchor, positive, negative).

Parameters
  • download – If True, data will be downloaded and save to data_folder.

  • data_folder – Directory with all the images.

  • transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.

  • augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.

  • input_dtype – Data type of the torch tensors related to the image.

  • reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.

  • image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.

  • rgba_layout_color – color of the background during conversion from rgba.

  • test_mode – If True, only image without labels will be returned.

__getitem__(idx: int) dict

Get item sample.

Returns

dict, where sample[‘anchor’] - Anchor. sample[‘positive’] - Positive. sample[‘negative’] - Negative. sample[‘index’] - Index of the sample, the same as input idx.

Return type

sample

get_raw(idx: int) dict

Get item sample.

Returns

dict, where sample[‘image’] - Tensor, representing image after augmentations. sample[‘target’] - Target class or labels. sample[‘index’] - Index of the sample, the same as input idx.

Return type

sample