2022-10-19

How to combine h5 data numpy arrays based on date in filename?

I have hundreds of .h5 files with dates in their filename (e.g ...20221017...). For each file, I have extracted some parameters into a numpy array of the format

[[param_1a, param_2a...param_5a],
  ... 
 [param_1x, param_2x,...param_5x]] 

which represents data of interest. I want to group the data by month, so instead of having (e.g) 30 arrays for one month, I have 1 array which represents the average of the 30 arrays. How can I do this?

This is the code I have so far, filename represents a txt file of file names.

def combine_months(filename):
    fin = open(filename, 'r')
    next_name = fin.readline()
    while (next_name != ""):
        year = next_name[6:10]
        month = next_name[11:13]
        date = month+'\\'+year
        #not sure where to go from here
    fin.close()

An example of what I hope to achieve is that say array_1, array_2, array_3 are numpy arrays representing data from different h5 files with the same month in the date of their filename.

array_1 = [[ 1  4 10]
           [ 2  5 11]
           [3  6 12]]
array_2 = [[ 1  2 5]
           [ 2  2 3]
           [ 3  6 12]]
array_3 = [[ 2  4 10]
           [ 3  2 3]
           [ 4  6 12]]

I want the result to look like:

2022_04_data = [[1,3,7.5]
                [2, 2, 6.5]
                [3,4,7.5]
                [4,6,12]]

Note that the first number of each row represents an ID, so I need to group those data together based on the first number as well.



No comments:

Post a Comment