How to combine h5 data numpy arrays based on date in filename?

I have hundreds of .h5 files with dates in their filename (e.g ...20221017...). For each file, I have extracted some parameters into a numpy array of the format

[[param_1a, param_2a...param_5a],
  ... 
 [param_1x, param_2x,...param_5x]] 

which represents data of interest. I want to group the data by month, so instead of having (e.g) 30 arrays for one month, I have 1 array which represents the average of the 30 arrays. How can I do this?

This is the code I have so far, filename represents a txt file of file names.

def combine_months(filename):
    fin = open(filename, 'r')
    next_name = fin.readline()
    while (next_name != ""):
        year = next_name[6:10]
        month = next_name[11:13]
        date = month+'\\'+year
        #not sure where to go from here
    fin.close()

An example of what I hope to achieve is that say array_1, array_2, array_3 are numpy arrays representing data from different h5 files with the same month in the date of their filename.

array_1 = [[ 1  4 10]
           [ 2  5 11]
           [3  6 12]]
array_2 = [[ 1  2 5]
           [ 2  2 3]
           [ 3  6 12]]
array_3 = [[ 2  4 10]
           [ 3  2 3]
           [ 4  6 12]]

I want the result to look like:

2022_04_data = [[1,3,7.5]
                [2, 2, 6.5]
                [3,4,7.5]
                [4,6,12]]

Note that the first number of each row represents an ID, so I need to group those data together based on the first number as well.



Comments

Popular posts from this blog

Spring Elasticsearch Operations

Object oriented programming concepts (OOPs)

Spring Boot and Vaadin : Filtering rows in Vaadin Grid