I/O Issues in Loading Several Large H5PY Files (Pytorch)

I met a problem!

Recently I meet a problem of I/O issue. The target and input data are stored with h5py files. Each target file is 2.6GB while each input file is 10.2GB. I have 5 input datasets and 5 target datasets in total.

I created a custom dataset function for each h5py file and then use data.ConcatDataset class to link all the datasets. The custom dataset function is:

class MydataSet(Dataset):
def __init__(self, indx=1, root_path='./xxx', tar_size=128, data_aug=True, train=True):
    self.train = train
    if self.train:
        self.in_file = pth.join(root_path, 'train', 'train_noisy_%d.h5' % indx)
        self.tar_file = pth.join(root_path, 'train', 'train_clean_%d.h5' % indx)
    else:
        self.in_file = pth.join(root_path, 'test', 'test_noisy.h5')
        self.tar_file = pth.join(root_path, 'test', 'test_clean.h5')
    self.h5f_n = h5py.File(self.in_file, 'r', driver='core')
    self.h5f_c = h5py.File(self.tar_file, 'r')
    self.keys_n = list(self.h5f_n.keys())
    self.keys_c = list(self.h5f_c.keys())
    # h5f_n.close()
    # h5f_c.close()

    self.tar_size = tar_size
    self.data_aug = data_aug

def __len__(self):
    return len(self.keys_n)

def __del__(self):
    self.h5f_n.close()
    self.h5f_c.close()

def __getitem__(self, index):
    keyn = self.keys_n[index]
    keyc = self.keys_c[index]
    datan = np.array(self.h5f_n[keyn])
    datac = np.array(self.h5f_c[keyc])
    datan_tensor = torch.from_numpy(datan).unsqueeze(0)
    datac_tensor = torch.from_numpy(datac)
    if self.data_aug and np.random.randint(2, size=1)[0] == 1: # horizontal flip
        datan_tensor = torch.flip(datan_tensor,dims=[2]) # c h w
        datac_tensor = torch.flip(datac_tensor,dims=[2])

Then I use dataset_train = data.ConcatDataset([MydataSet(indx=index, train=True) for index in range(1, 6)]) for training. When only 2-3 h5py files are used, the I/O speed is normal and everything goes right. However, when 5 files are used, the training speed is gradually decreasing (5 iterations/s to 1 iterations/s). I change the num_worker and the problem still exists.

Could anyone give me a solution? Should I merge several h5py files into a bigger one? Or other methods? Thanks in advance!



from Recent Questions - Stack Overflow https://ift.tt/3l9c2Yf
https://ift.tt/eA8V8J

Comments

Popular posts from this blog

Spring Elasticsearch Operations

Network Error and Timeout on Authorize.net JS

Object oriented programming concepts (OOPs)