Python Pandas Dataframe - Nothing being returned from my function
I have two dataframes:
energy_calculated (the time_stamp columns were just formatted using 3 decimal values to make sure there weren't any hidden values disrupting the simple math):
fl_key min_time_stamp max_time_stamp energy
0 10051 1614556800019.000 1614556807979.000 0.352
1 10051 1614556808019.000 1614556815979.000 0.275
2 10051 1614556816019.000 1614556823979.000 0.429
3 10051 1614556824019.000 1614556831979.000 0.406
4 10051 1614556832019.000 1614556839979.000 0.444
5 10051 1614556840019.000 1614556847979.000 0.348
6 10051 1614556848019.000 1614556855979.000 0.381
7 10051 1614556856019.000 1614556863979.000 0.456
8 10051 1614556864019.000 1614556871979.000 0.362
9 10051 1614556872019.000 1614556879979.000 0.465
10 10051 1614556880019.000 1614556887979.000 0.577
11 10051 1614556888019.000 1614556895979.000 0.305
12 10051 1614556896019.000 1614556903979.000 0.347
13 10051 1614556904019.000 1614556911979.000 0.246
14 10051 1614556912019.000 1614556919939.000 0.340
df_test:
fl_Key time_stamp energy install_prediction
1007 10051 1614556840299 -1 -1
491 10051 1614556819659 -1 -1
1944 10051 1614556877779 -1 -1
2227 10051 1614556889099 -1 -1
677 10051 1614556827099 -1 -1
2944 10051 1614556917779 -1 -1
799 10051 1614556831979 -1 -1
2378 10051 1614556895139 -1 -1
1877 10051 1614556875099 -1 -1
487 10051 1614556819499 -1 -1
I am trying to do a lookup on the fl_Key and time_stamp from the df_test dataframe using them to find the "energy" value from the energy_calculated dataframe. The fl_Key to fl_key column should be exact match. The time_stamp column should be in between the min and max time_stamp columns.
The fl_Key and fl_key names are different so I can track which column is coming from where.
I have a simple method (I put in the raise exceptions just to make sure it was always finding a match):
def integrateEnergyCalculationData(row, energy_calculations):
energy_calculations = energy_calculations[(energy_calculations['fl_key'] == row.fl_Key) & (energy_calculations['min_time_stamp'] <= row.time_stamp) & (energy_calculations['max_time_stamp'] >= row.time_stamp)]
if (len(energy_calculations) == 0):
raise Exception("No energy data for: " + str(row.fl_Key) + ", " + str(row.time_stamp))
elif (len(energy_calculations) >= 2):
raise Exception("Too much energy data for: " + str(row.fl_Key) + ", " + str(row.time_stamp))
return energy_calculations['energy']
I tie it all together using apply():
df_test['energy'] = df_test[['time_stamp','fl_Key']].apply(integrateEnergyCalculationData, 1, args=(energy_calculated, ))
What ends up happening is that the mapping is made for some of the rows, but not all of them:
My resulting df_test dataframe looks like (I have a much bigger version of df_test, but I have shortened it to 10 rows to demonstrate the issue). I randomly selected 10 rows from the bigger version - that is why the index numbers are out of whack:
fl_Key time_stamp energy install_prediction
1007 10051 1614556840299 -1
491 10051 1614556819659 0.4291915384067029 -1
1944 10051 1614556877779 -1
2227 10051 1614556889099 -1
677 10051 1614556827099 -1
2944 10051 1614556917779 -1
799 10051 1614556831979 -1
2378 10051 1614556895139 -1
1877 10051 1614556875099 -1
487 10051 1614556819499 0.4291915384067029 -1
What am I missing? Thanks.
from Recent Questions - Stack Overflow https://ift.tt/3frBZAa
https://ift.tt/eA8V8J
Comments
Post a Comment