Skip to content

Using custom COCO dataset for training #2074

@TychoBomer

Description

@TychoBomer

💡 Your Question

Hello!

So I have been using training dataloaders for train and val using this setup:

from super_gradients.training.dataloaders.dataloaders import coco_detection_yolo_format_train, coco_detection_yolo_format_val

Now for reason relating to ambigous dataextraction from our label software we want to switch to COCO format.

I have a coco_train.json and associated train images in a folder
and also coco_val.json and associated val images in a folder

I am however unsure how to create a custom dataloader for the coco dataset.

I tried the following setup:

dataset_params = {
    'data_dir': TrainModelConfig.dataset_folder_location,
    'train_json': 'annotations/instances_train.json',
    'val_json': 'annotations/instances_val.json',
    'train_images_dir': 'train/images',  # Path to train images folder
    'val_images_dir': 'val/images',  # Path to val images folder
    'classes': CLASSES,
    'input_dim': TrainModelConfig.input_dim
}



# Train dataset
train_dataset = COCOFormatDetectionDataset(
    data_dir=dataset_params['data_dir'],
    json_annotation_file=dataset_params['train_json'], 
    images_dir=dataset_params['train_images_dir'],  
    with_crowd=False  
)

# Validation dataset
val_dataset = COCOFormatDetectionDataset(
    data_dir=dataset_params['data_dir'],
    json_annotation_file=dataset_params['val_json'], 
    images_dir=dataset_params['val_images_dir'],  
)

Conversion using Dataloader

train_data = DataLoader(
    train_dataset,
    batch_size=TrainModelConfig.batch_size,
    num_workers=TrainModelConfig.num_workers,
    shuffle=True 
)

val_data = DataLoader(
    val_dataset,
    batch_size=TrainModelConfig.batch_size,
    num_workers=TrainModelConfig.num_workers,
    shuffle=False  
)

trainer.train(
    model=model,
    training_params=train_params,
    train_loader=train_data,
    valid_loader=val_data,
)

But it fails due to not havin collacate_fn.

Anyone has had succes with training from COCO format??

Versions

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions