2022-11-24

Determining the splitting ratio when augmenting image data

I have an image dataset that is quite imbalanced, with one class having 2873 images and another having only 115. The rest of the classes have ~250 images each. For reducing the imbalance, I decided to split the dataset into Train-Valid-Test components, with the major class having less proportion of images in the training set compared to the minor classes. Then I'll be augmenting the data in the training set. I intend to perform an 80-10-10 split on the dataset.

Which outcome shall be considered as an 80-10-10 split?

  • Splitting the dataset in the proportion 80-10-10, and THEN augmenting the training images (which would eventually result in >80% proportion for the training set after augmentation).
  • Splitting the dataset in a proportion such that it eventually results in an 80-10-10 split AFTER augmentation.

Also, is it acceptable to have an 85-7.5-7.5 split, provided it reduces imbalance in the dataset?



No comments:

Post a Comment