Determining the splitting ratio when augmenting image data
I have an image dataset that is quite imbalanced, with one class having 2873 images and another having only 115. The rest of the classes have ~250 images each. For reducing the imbalance, I decided to split the dataset into Train-Valid-Test components, with the major class having less proportion of images in the training set compared to the minor classes. Then I'll be augmenting the data in the training set. I intend to perform an 80-10-10 split on the dataset.
Which outcome shall be considered as an 80-10-10 split?
- Splitting the dataset in the proportion 80-10-10, and THEN augmenting the training images (which would eventually result in >80% proportion for the training set after augmentation).
- Splitting the dataset in a proportion such that it eventually results in an 80-10-10 split AFTER augmentation.
Also, is it acceptable to have an 85-7.5-7.5 split, provided it reduces imbalance in the dataset?
Comments
Post a Comment