Datasets
Interface
Each dataset in TorchOk has to be inherited from a single interface ImageDataset. There are a few methods that need to be implemented. Follow the general principles when you implement your dataset:
Constructor
def __init__(self,
transform: Optional[Union[BasicTransform, BaseCompose]],
augment: Optional[Union[BasicTransform, BaseCompose]] = None,
input_dtype: str = 'float32',
image_format: str = 'rgb',
rgba_layout_color: Union[int, Tuple[int, int, int]] = 0,
test_mode: bool = False):
# Use transforms and augments for two different purposes: augmentations should be applied to get a randomly
# manipulated image version while transformations are used to get a fixed transformation of each input image
# to be able to pass it to the neural network model (like resizing, normalization and to-tensor conversion)
Length of the dataset
def __len__(self) -> int:
# Return total expected length of the dataset
Getting access to a raw item of the dataset
def get_raw(self, idx: int) -> dict:
# Read a sample from disk or whatever your dataset is using. You can utilize self._read_image(image_path) call.
# Use augmentations on numpy images here
# Return a dictionary with string keys and tensor values. Usually, images are returned as numpy arrays before
# normalization here, so that a user can directly call this method to get understanding on how an output image
# looks like
Getting access to a tensor item of the dataset
def __getitem__(self, idx: int) -> dict:
# Usually, a self.get_raw(idx) is called here.
# Then you should use transformations to transform numpy images and other samples to PyTorch tensors
Classification
- class torchok.data.datasets.classification.classification.ImageClassificationDataset(data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, annotation_path: Optional[str] = None, num_classes: Optional[int] = None, input_column: str = 'image_path', input_dtype: str = 'float32', target_column: str = 'label', target_dtype: str = 'long', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False, multilabel: bool = False, lazy_init: bool = False, csv_path: Optional[str] = None)
Bases:
ImageDatasetA generic dataset for multilabel/multiclass image classification task.
Multiclass task csv example. image_path
label
cat_1.jpg
1
dog_1.jpg
0
Multilabel task csv example. image_path
label
cat_dog_1.jpg
0 1
cat_dog_2.jpg
0 1
dog_1.jpg
0
- __init__(data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, annotation_path: Optional[str] = None, num_classes: Optional[int] = None, input_column: str = 'image_path', input_dtype: str = 'float32', target_column: str = 'label', target_dtype: str = 'long', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False, multilabel: bool = False, lazy_init: bool = False, csv_path: Optional[str] = None)
Init ImageClassificationDataset.
- Parameters
data_folder – Directory with all the images.
annotation_path – Path to the .pkl or .csv file with path to images and annotations. Path to images must be under column
input_columnand annotations must be undertarget_columncolumn.transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.
augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.
num_classes – Number of classes (i.e. maximum class index in the dataset).
input_column – column name containing paths to the images.
input_dtype – Data type of the torch tensors related to the image.
target_column – column name containing image label.
target_dtype – Data type of the torch tensors related to the target.
reader_library – Image reading library. Can be ‘opencv’ or ‘pillow’.
image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.
rgba_layout_color – color of the background during conversion from rgba.
test_mode – If True, only image without labels will be returned.
multilabel – If True, targets are being converted to multihot vector for multilabel task. If False, dataset prepares targets for multiclass classification.
lazy_init – If True, for multilabel the target variable is converted to multihot when __getitem__ is called. For multiclass will check the class index to fit the range when
__getitem__is called.csv_path – DEPRECATED, Path to the .pkl or .csv file with path to images and annotations. Path to images must be under column
input_columnand annotations must be undertarget_columncolumn.
- get_raw(idx: int) dict
Get item sample without transform application.
- Returns
dict, where sample[‘image’] - np.array, representing image after augmentations. sample[‘target’] - Target class or labels. sample[‘index’] - Index of the sample, the same as input idx.
- Return type
sample
- __getitem__(idx: int) dict
Get item sample.
- Returns
dict, where sample[‘image’] - Tensor, representing image after augmentations and transformations, dtype=input_dtype. sample[‘target’] - Target class or labels, dtype=target_dtype. sample[‘index’] - Index of the sample, the same as input idx.
- Return type
sample
- process_function(target: Any) Any
Prepare dataset target based of classification type.
- Parameters
target – Classification labels to prepare.
- Returns
Prepared classification labels.
Segmentation
- class torchok.data.datasets.segmentation.image_segmentation.ImageSegmentationDataset(data_folder: Union[Path, str], annotation_path: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_column: str = 'image_path', input_dtype: str = 'float32', target_column: str = 'mask_path', target_dtype: str = 'int64', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)
Bases:
ImageDatasetA dataset for image segmentation task.
Segmentation csv example. image_path
mask
image1.png
mask1.png
image2.png
mask2.png
image3.png
mask3.png
- __init__(data_folder: Union[Path, str], annotation_path: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_column: str = 'image_path', input_dtype: str = 'float32', target_column: str = 'mask_path', target_dtype: str = 'int64', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)
Init ImageSegmentationDataset.
- Parameters
data_folder – Directory with all the images.
annotation_path – Path to the .pkl or .csv file with path to images and masks. Path to images must be under column image_path and annotations must be under mask column. User can change column names, if the csv_columns_mapping is given.
transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.
augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.
input_column – column name containing paths to the images.
input_dtype – Data type of the torch tensors related to the image.
target_dtype – Data type of the torch tensors related to the target.
reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.
image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.
rgba_layout_color – color of the background during conversion from rgba.
test_mode – If True, only image without labels will be returned.
- get_raw(idx: int) dict
- __getitem__(idx: int) Dict[str, Any]
Representation
- class torchok.data.datasets.representation.unsupervised_contrastive_dataset.UnsupervisedContrastiveDataset(data_folder: str, transform: Union[BasicTransform, BaseCompose], augment: Optional[Union[BasicTransform, BaseCompose]] = None, annotation_path: Optional[str] = None, input_column: str = 'image_path', input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, csv_path: Optional[str] = None)
Bases:
ImageDatasetA dataset for unsupervised contrastive task.
One image is transformed twice so that they are positive to each other.
UnsupervisedContrastive csv example image_path
cat_1.jpg
dog_1.jpg
- __init__(data_folder: str, transform: Union[BasicTransform, BaseCompose], augment: Optional[Union[BasicTransform, BaseCompose]] = None, annotation_path: Optional[str] = None, input_column: str = 'image_path', input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, csv_path: Optional[str] = None)
Init UnsupervisedContrastiveDataset.
- Parameters
data_folder – Directory with all the images.
annotation_path – Path to the .pkl or .csv file with path to images and annotations. Path to images must be under column input_column.
transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.
augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.
input_column – column name containing paths to the images.
input_dtype – data type of the torch tensors related to the image.
reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.
image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.
rgba_layout_color – color of the background during conversion from rgba.
csv_path – DEPRECATED, Path to the .pkl or .csv file with path to images and annotations. Path to images must be under column input_column.
- get_raw(idx: int) dict
Get item sample.
- Returns
dict, where sample[‘image_0’] - Tensor, representing image after augmentations. sample[‘image_1’] - Tensor, representing image after augmentations. sample[‘index’] - Index of the sample, the same as input idx.
- Return type
sample
- __getitem__(idx: int) dict
Get item sample.
- Returns
dict, where sample[‘image_0’] - Tensor, representing image after augmentations and transformations, dtype=input_dtype. sample[‘image_1’] - Tensor, representing image after augmentations and transformations, dtype=input_dtype. sample[‘index’] - Index of the sample, the same as input idx.
- Return type
sample
- class torchok.data.datasets.representation.validation.RetrievalDataset(data_folder: str, matches_csv_path: str, img_list_csv_path: str, transform: Union[BasicTransform, BaseCompose], augment: Optional[Union[BasicTransform, BaseCompose]] = None, gallery_folder: Optional[str] = '', gallery_list_csv_path: Optional[str] = None, use_query_without_relevants: bool = False, input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0)
Bases:
ImageDatasetDataset for image retrieval validation.
The searches are made by queries while looking for relevant items in the whole set of items. Where gallery items are treated non-relevant.
Example matches csv: Query ids should be unique int values, otherwise the rows having the same query id will be treated as different matches.
Relevant ids can be repeated in different queries.
Scores reflect the order of similarity of the image to the query, a higher score corresponds to a greater similarity(must be float value > 0.).
Match csv example query
relevant
scores
1194917
601566 554492 224125 2001716519
4 3 2 2
1257924
456490
4
Example img_list csv: img_list.csv maps the id’s of query and relevant elements to image paths
Image csv example id
image_path
label
1194917
data/img_1.jpg
0
601566
data/img_2.jpg
0
554492
data/img_3.jpg
0
224125
data/img_4.jpg
1
2001716519
data/img_5.jpg
1
1257924
data/img_6.jpg
1
456490
data/img_7.jpg
2
Gallery Image csv example id
image_paths
8
data/db/img_1.jpg
10
data/db/img_2.jpg
12
data/db/img_3.jpg
- __init__(data_folder: str, matches_csv_path: str, img_list_csv_path: str, transform: Union[BasicTransform, BaseCompose], augment: Optional[Union[BasicTransform, BaseCompose]] = None, gallery_folder: Optional[str] = '', gallery_list_csv_path: Optional[str] = None, use_query_without_relevants: bool = False, input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0)
Init RetrievalDataset class.
- Parameters
data_folder – Directory with all the images.
matches_csv_path – path to csv file where queries with their relevance scores are specified
img_list_csv_path – path to mapping image identifiers to image paths. Format: id | path. ID from matches csv are linked to id from img_list csv
transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.
augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.
gallery_folder – Path to a folder with all gallery images (traversed recursively). When the gallery not specified all the remaining queries and relevant will be considered as negative samples to a given query-relevant set.
gallery_list_csv_path – Path to mapping image identifiers to image paths. Format: id | path.
use_query_without_relevants – If True, use query without relevants.
input_dtype – Data type of the torch tensors related to the image.
reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.
image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.
rgba_layout_color – color of the background during conversion from rgba.
- Raises
ValueError – if gallery_folder True, but gallery_list_csv_path is None.
- get_raw(idx: int) dict
Get item sample.
- Returns
image - np.array, representing image after augmentations, dtype=input_dtype. index - Index from DataFrame. query_idxs - Int tensor, if item is query: return index of this query in target matrix, else -1. scores - Float tensor shape (1, len(n_query)), relevant scores of current item. group_labels - Int tensor with image classification label.
- Return type
dict with fields
- __getitem__(index: int) dict
Get item sample.
- Returns
image - Tensor, representing image after augmentations and transformations, dtype=input_dtype. index - Index from DataFrame. query_idxs - Int tensor, if item is query: return index of this query in target matrix, else -1. scores - Float tensor shape (1, len(n_query)), relevant scores of current item. group_labels - Int tensor with image classification label.
- Return type
dict with fields
Detection
- class torchok.data.datasets.detection.detection.DetectionDataset(data_folder: Union[Path, str], annotation_path: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_column: str = 'image_path', input_dtype: str = 'float32', bbox_column: str = 'bbox', bbox_dtype: str = 'float32', target_column: str = 'label', target_dtype: str = 'long', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False, bbox_format: str = 'coco', min_area: float = 0.0, min_visibility: float = 0.0, filter_bboxes_on_start: bool = False)
Bases:
ImageDatasetA dataset for image detection task.
Detection csv example. image_path
bbox
label
image1.png
[[217.62
240.54
38.99
57.75]
[1.0
240.24
346.63
186.76]]
[0
1]
image2.png
[[102.49
118.47
7.9
17.31]]
[2
1]
image3.png
[[253.21
271.07
59.59
60.97]
[257.85
224.48
44.13
97.0]]
[2
0]
- __init__(data_folder: Union[Path, str], annotation_path: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_column: str = 'image_path', input_dtype: str = 'float32', bbox_column: str = 'bbox', bbox_dtype: str = 'float32', target_column: str = 'label', target_dtype: str = 'long', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False, bbox_format: str = 'coco', min_area: float = 0.0, min_visibility: float = 0.0, filter_bboxes_on_start: bool = False)
Init DetectionDataset.
- Parameters
data_folder – Directory with all the images.
annotation_path – Path to the .pkl or .csv file with image paths, bboxes and labels. Path to images must be under column image_path, bboxes must be under bbox column and bbox labels must be under label column. User can change column names, if the input_column, bbox_column or target_column is given.
transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.
augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.
input_column – Column name containing paths to the images.
input_dtype – Data type of the torch tensors related to the image.
bbox_column – Column name containing list of bboxes for every image.
bbox_dtype – Data type of the torch tensors related to the bboxes.
target_column – Column name containing bboxes labels.
target_dtype – Data type of the torch tensors related to the bboxes labels.
reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.
image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.
rgba_layout_color – color of the background during conversion from rgba.
test_mode – If True, only image without labels will be returned.
bbox_format – Bboxes format, for albumentations transform. Supports the following formats: pascal_voc - [x_min, y_min, x_max, y_max] = [98, 345, 420, 462] albumentations - [x_min, y_min, x_max, y_max] = [0.1531, 0.71875, 0.65625, 0.9625] coco - [x_min, y_min, width, height] = [98, 345, 322, 117] yolo - [x_center, y_center, width, height] = [0.4046875, 0.8614583, 0.503125, 0.24375]
min_area – Value in pixels If the area of a bounding box after augmentation becomes smaller than min_area, Albumentations will drop that box. So the returned list of augmented bounding boxes won’t contain that bounding box.
min_visibility – Value between 0 and 1. If the ratio of the bounding box area after augmentation to the area of the bounding box before augmentation becomes smaller than min_visibility, Albumentations will drop that box. So if the augmentation process cuts the most of the bounding box, that box won’t be present in the returned list of the augmented bounding boxes.
filter_bboxes_on_start – if True apply filter_bboxes function on the whole dataset at the init otherwise apply in get_raw
- Raises
RuntimeError – if annotation_path is not in pkl or csv format.
- filter_bboxes(bboxes: Tensor, labels: Tensor, rows: int, cols: int) [<class 'torch.Tensor'>, <class 'torch.Tensor'>]
Filter empty bounding boxes.
- Parameters
bboxes – List of bounding box.
labels – array of bbox labels
rows – Image height.
cols – Image width.
- Returns
numpy array of bounding boxes and numpy array of labels of these boxes.
- get_raw(idx: int) dict
- __getitem__(idx: int) dict
Get item sample.
- Returns
dict, where sample[‘image’] - Tensor, representing image after augmentations and transformations, dtype=input_dtype. sample[‘target’] - Target class or labels, dtype=target_dtype. sample[‘bboxes’] - Target bboxes, dtype=bbox_dtype. sample[‘index’] - Index of the sample, the same as input idx.
- Return type
sample
- collate_fn(batch)
Puts each data field into a tensor with outer dimension batch size
Ready-to-go
- class torchok.data.datasets.examples.coco_detection.COCODetection(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', target_dtype: str = 'long', bbox_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False, min_area: float = 0, min_visibility: float = 0.0)
Bases:
DetectionDatasetA class represent detection COCO dataset https://cocodataset.org/#home.
The COCO Object Detection Task is designed to push the state of the art in object detection forward. COCO features two object detection tasks: using either bounding box output or object segmentation output (the latter is also known as instance segmentation).
COCO dataset has 81 categories where 0 - background label. Train set contains 118287 images, validation set - 5000.
This Dataset occupies 20 Gb of memory.
- CLASSES = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']
- label_mapping = {1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 13: 12, 14: 13, 15: 14, 16: 15, 17: 16, 18: 17, 19: 18, 20: 19, 21: 20, 22: 21, 23: 22, 24: 23, 25: 24, 27: 25, 28: 26, 31: 27, 32: 28, 33: 29, 34: 30, 35: 31, 36: 32, 37: 33, 38: 34, 39: 35, 40: 36, 41: 37, 42: 38, 43: 39, 44: 40, 46: 41, 47: 42, 48: 43, 49: 44, 50: 45, 51: 46, 52: 47, 53: 48, 54: 49, 55: 50, 56: 51, 57: 52, 58: 53, 59: 54, 60: 55, 61: 56, 62: 57, 63: 58, 64: 59, 65: 60, 67: 61, 70: 62, 72: 63, 73: 64, 74: 65, 75: 66, 76: 67, 77: 68, 78: 69, 79: 70, 80: 71, 81: 72, 82: 73, 84: 74, 85: 75, 86: 76, 87: 77, 88: 78, 89: 79, 90: 80}
- base_folder = 'COCO'
- train_data_filename = 'train2017.zip'
- train_data_url = 'http://images.cocodataset.org/zips/train2017.zip'
- train_data_hash = 'cced6f7f71b7629ddf16f17bbcfab6b2'
- valid_data_filename = 'valid2017.zip'
- valid_data_url = 'http://images.cocodataset.org/zips/val2017.zip'
- valid_data_hash = '442b8da7639aecaf257c1dceb8ba8c80'
- annotations_filename = 'annotations.zip'
- annotations_url = 'http://images.cocodataset.org/annotations/annotations_trainval2017.zip'
- annotations_hash = 'f4bbac642086de4f52a3fdda2de5fa2c'
- train_pkl = 'train_detection.pkl'
- valid_pkl = 'valid_detection.pkl'
- __init__(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', target_dtype: str = 'long', bbox_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False, min_area: float = 0, min_visibility: float = 0.0)
Init SweetPepper.
- Parameters
train – If True, train dataset will be used, else - test dataset.
download – If True, data will be downloaded and save to data_folder.
data_folder – Directory with all the images.
transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.
augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.
input_dtype – Data type of the torch tensors related to the image.
target_dtype – Data type of the torch tensors related to the bboxes labels.
bbox_dtype – Data type of the torch tensors related to the bboxes.
reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.
image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.
rgba_layout_color – color of the background during conversion from rgba.
test_mode – If True, only image without labels will be returned.
min_area – Value in pixels If the area of a bounding box after augmentation becomes smaller than min_area, Albumentations will drop that box. So the returned list of augmented bounding boxes won’t contain that bounding box.
min_visibility – Value between 0 and 1. If the ratio of the bounding box area after augmentation to the area of the bounding box before augmentation becomes smaller than min_visibility, Albumentations will drop that box. So if the augmentation process cuts the most of the bounding box, that box won’t be present in the returned list of the augmented bounding boxes.
- create_annotation(json_path: Union[Path, str], image_folder: Union[Path, str], save_df_path: Union[Path, str])
Create train-valid csv for loaded COCO dataset.
- Parameters
json_path – COCO json annotation file path.
image_folder – COCO images folder.
save_df_path – Pickle save name.
- __getitem__(idx: int) dict
Get item sample.
- Returns
dict, where sample[‘image’] - Tensor, representing image after augmentations and transformations, dtype=input_dtype. sample[‘target’] - Target class or labels, dtype=target_dtype. sample[‘bboxes’] - Target bboxes, dtype=bbox_dtype. sample[‘index’] - Index of the sample, the same as input idx.
- Return type
sample
- class torchok.data.datasets.examples.coco_segmentation.COCOSegmentation(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', target_dtype: str = 'long', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)
Bases:
ImageSegmentationDatasetA class represent detection COCO dataset https://cocodataset.org/#home.
The COCO Object Detection Task is designed to push the state of the art in object detection forward. COCO features two object detection tasks: using either bounding box output or object segmentation output (the latter is also known as instance segmentation).
COCO dataset has 81 categories where 0 - background label. Train set contains 118287 images, validation set - 5000.
This Dataset occupies 21 Gb of memory.
- CLASSES = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']
- label_mapping = {1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 13: 12, 14: 13, 15: 14, 16: 15, 17: 16, 18: 17, 19: 18, 20: 19, 21: 20, 22: 21, 23: 22, 24: 23, 25: 24, 27: 25, 28: 26, 31: 27, 32: 28, 33: 29, 34: 30, 35: 31, 36: 32, 37: 33, 38: 34, 39: 35, 40: 36, 41: 37, 42: 38, 43: 39, 44: 40, 46: 41, 47: 42, 48: 43, 49: 44, 50: 45, 51: 46, 52: 47, 53: 48, 54: 49, 55: 50, 56: 51, 57: 52, 58: 53, 59: 54, 60: 55, 61: 56, 62: 57, 63: 58, 64: 59, 65: 60, 67: 61, 70: 62, 72: 63, 73: 64, 74: 65, 75: 66, 76: 67, 77: 68, 78: 69, 79: 70, 80: 71, 81: 72, 82: 73, 84: 74, 85: 75, 86: 76, 87: 77, 88: 78, 89: 79, 90: 80}
- base_folder = 'COCO'
- train_data_filename = 'train2017.zip'
- train_data_url = 'http://images.cocodataset.org/zips/train2017.zip'
- train_data_hash = 'cced6f7f71b7629ddf16f17bbcfab6b2'
- valid_data_filename = 'valid2017.zip'
- valid_data_url = 'http://images.cocodataset.org/zips/val2017.zip'
- valid_data_hash = '442b8da7639aecaf257c1dceb8ba8c80'
- annotations_filename = 'annotations.zip'
- annotations_url = 'http://images.cocodataset.org/annotations/annotations_trainval2017.zip'
- annotations_hash = 'f4bbac642086de4f52a3fdda2de5fa2c'
- train_csv = 'train_segmentation.csv'
- valid_csv = 'valid_segmentation.csv'
- __init__(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', target_dtype: str = 'long', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)
Init SweetPepper.
- Parameters
train – If True, train dataset will be used, else - test dataset.
download – If True, data will be downloaded and save to data_folder.
data_folder – Directory with all the images.
transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.
augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.
input_dtype – Data type of the torch tensors related to the image.
target_dtype – Data type of the torch tensors related to the target mask.
reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.
image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.
rgba_layout_color – color of the background during conversion from rgba.
test_mode – If True, only image without labels will be returned.
- create_annotation(json_path: Union[str, Path], mask_folder: Union[str, Path], save_df_path: Union[str, Path])
Create train-valid csv for loaded COCO dataset.
- Parameters
json_path – COCO json annotation file path.
mask_folder – COCO mask save folder.
save_df_path – Pickle save name.
- class torchok.data.datasets.examples.sop.SOP(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)
Bases:
ImageDatasetA class represent Stanford Online Products - SOP dataset.
Additionally, we collected Stanford Online Products dataset: 120k images of 23k classes of online products for metric learning. The homepage of SOP is https://cvgl.stanford.edu/projects/lifted_struct/.
- base_folder = 'Stanford_Online_Products'
- filename = 'Stanford_Online_Products.tar.gz'
- url = 'https://torchok-hub.s3.eu-west-1.amazonaws.com/Stanford_Online_Products.tar.gz'
- tgz_md5 = 'b96128cf2b75493708511ff5c400eefe'
- train_txt = 'Ebay_train.txt'
- test_txt = 'Ebay_test.txt'
- __init__(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)
Init SOP.
Have 120,053 images with 22,634 classes in the dataset in total. Train have 59551 images with 11318 classes. Test have 60502 images with 11316 classes.
- Parameters
train – If True, train dataset will be used, else - test dataset.
download – If True, data will be downloaded and save to data_folder.
data_folder – Directory with all the images.
transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.
augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.
input_dtype – Data type of the torch tensors related to the image.
reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.
image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.
rgba_layout_color – color of the background during conversion from rgba.
test_mode – If True, only image without labels will be returned.
- get_raw(idx: int) dict
Get item sample.
- Returns
dict, where sample[‘image’] - Tensor, representing image after augmentations. sample[‘target’] - Target class or labels. sample[‘index’] - Index of the sample, the same as input idx.
- Return type
sample
- __getitem__(idx: int) dict
Get item sample.
- Returns
dict, where sample[‘image’] - Tensor, representing image after augmentations and transformations, dtype=input_dtype. sample[‘target’] - Target class or labels. sample[‘index’] - Index of the sample, the same as input idx.
- Return type
sample
- class torchok.data.datasets.examples.sweet_pepper.SweetPepper(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', target_dtype: str = 'int64', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)
Bases:
ImageSegmentationDatasetA class represent segmentation dataset Sweet Pepper from Kaggle https://www.kaggle.com/datasets/lemontyc/sweet-pepper.
The main task for this dataset is segment peppers (fruit) and peduncle on the images, obtained from different farm locations. Dataset has 3 labels: 0 - background, 1 - fruit and 2 - peduncle. Dataset contain 620 images in HD resolution, 500 - for train and 120 for validate.
- base_folder = 'sweet_pepper'
- filename = 'sweet_pepper.tar.gz'
- url = 'https://torchok-hub.s3.eu-west-1.amazonaws.com/sweet_pepper.tar.gz'
- tgz_md5 = '65021e5fad5fe286b3c2bac7753d6e9d'
- train_csv = 'train.csv'
- valid_csv = 'valid.csv'
- __init__(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, input_dtype: str = 'float32', target_dtype: str = 'int64', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)
Init SweetPepper.
- Parameters
train – If True, train dataset will be used, else - test dataset.
download – If True, data will be downloaded and save to data_folder.
data_folder – Directory with all the images.
transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.
augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.
input_dtype – Data type of the torch tensors related to the image.
target_dtype – Data type of the torch tensors related to the target.
reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.
image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.
rgba_layout_color – color of the background during conversion from rgba.
test_mode – If True, only image without labels will be returned.
- class torchok.data.datasets.examples.triplet_sop.TRIPLET_SOP(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, anchor_column: str = 'anchor', positive_column: str = 'positive', negative_column: str = 'negative', input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)
Bases:
ImageDatasetA class represent Stanford Online Products - SOP dataset.
Additionally, we collected Stanford Online Products dataset: 120k images of 23k classes of online products for metric learning. The homepage of SOP is https://cvgl.stanford.edu/projects/lifted_struct/.
- base_folder = 'Stanford_Online_Products'
- filename = 'Stanford_Online_Products.tar.gz'
- url = 'https://torchok-hub.s3.eu-west-1.amazonaws.com/Stanford_Online_Products.tar.gz'
- tgz_md5 = 'b96128cf2b75493708511ff5c400eefe'
- train_csv = 'sop_triplet_train.csv'
- test_csv = 'sop_triplet_test.csv'
- __init__(train: bool, download: bool, data_folder: str, transform: Optional[Union[BasicTransform, BaseCompose]], augment: Optional[Union[BasicTransform, BaseCompose]] = None, anchor_column: str = 'anchor', positive_column: str = 'positive', negative_column: str = 'negative', input_dtype: str = 'float32', reader_library: str = 'opencv', image_format: str = 'rgb', rgba_layout_color: Union[int, Tuple[int, int, int]] = 0, test_mode: bool = False)
Init TRIPLET SOP.
Dataset have 11319 image pair(anchor, positive, negative).
- Parameters
download – If True, data will be downloaded and save to data_folder.
data_folder – Directory with all the images.
transform – Transform to be applied on a sample. This should have the interface of transforms in albumentations library.
augment – Optional augment to be applied on a sample. This should have the interface of transforms in albumentations library.
input_dtype – Data type of the torch tensors related to the image.
reader_library – Image reading library. Can be ‘opencv’or ‘pillow’.
image_format – format of images that will be returned from dataset. Can be rgb, bgr, rgba, gray.
rgba_layout_color – color of the background during conversion from rgba.
test_mode – If True, only image without labels will be returned.
- __getitem__(idx: int) dict
Get item sample.
- Returns
dict, where sample[‘anchor’] - Anchor. sample[‘positive’] - Positive. sample[‘negative’] - Negative. sample[‘index’] - Index of the sample, the same as input idx.
- Return type
sample
- get_raw(idx: int) dict
Get item sample.
- Returns
dict, where sample[‘image’] - Tensor, representing image after augmentations. sample[‘target’] - Target class or labels. sample[‘index’] - Index of the sample, the same as input idx.
- Return type
sample