Determining the splitting ratio when augmenting image data

By Ritesh Sahu - November 24, 2022

I have an image dataset that is quite imbalanced, with one class having 2873 images and another having only 115. The rest of the classes have ~250 images each. For reducing the imbalance, I decided to split the dataset into Train-Valid-Test components, with the major class having less proportion of images in the training set compared to the minor classes. Then I'll be augmenting the data in the training set. I intend to perform an 80-10-10 split on the dataset.

Which outcome shall be considered as an 80-10-10 split?

Splitting the dataset in the proportion 80-10-10, and THEN augmenting the training images (which would eventually result in >80% proportion for the training set after augmentation).
Splitting the dataset in a proportion such that it eventually results in an 80-10-10 split AFTER augmentation.

Also, is it acceptable to have an 85-7.5-7.5 split, provided it reduces imbalance in the dataset?

Search This Blog

Theprogrammersfirst | A technical portal.

Determining the splitting ratio when augmenting image data

Comments

Post a Comment

Popular posts from this blog

Spring Elasticsearch Operations

Hibernate Search - Elasticsearch with JSON manipulation

Today Walkin 14th-Sept