Compare two columns based on last N rows in a pandas DataFrame
I want to groupby "ts_code" and calculate percentage between one column max and min value from another column after max based on last N rows for each group. Specifically,
df
ts_code high low
0 A 20 10
1 A 30 5
2 A 40 20
3 A 50 10
4 A 20 30
5 B 20 10
6 B 30 5
7 B 40 20
8 B 50 10
9 B 20 30
Goal
Below is my expected result
ts_code high low l3_high_low_pct_chg l4_high_low_pct_chg
0 A 20 10 NA NA
1 A 30 5 NA NA
2 A 40 20 0.5 NA
3 A 50 10 0.8 0.8
4 A 20 30 0.8 0.8
5 B 50 10 NA NA
6 B 30 5 NA NA
7 B 40 20 0.9 NA
8 B 10 10 0.75 0.9
9 B 20 30 0.75 0.75
ln_high_low_pct_chg(such as l3_high_low_pct_chg)= 1-(the min value of the low column after the peak)/(the max value of high column),on last N rows for each group and each row.
Try and problem
df['l3_highest']=df.groupby('ts_code')['high'].transform(lambda x: x.rolling(3).max())
df['l3_lowest']=df.groupby('ts_code')['low'].transform(lambda x: x.rolling(3).min())
df['l3_high_low_pct_chg']=1-df['l3_lowest']/df['l3_highest']
But it fails such that for second row, the l3_lowest would be 5 not 20. I don't know how to calculate percentage after peak.
For last 4 rows, at index=8, low=10,high=50,low=5, l4_high_low_pct_chg=0.9 , at index=9, high=40, low=10, l4_high_low_pct_chg=0.75
Comments
Post a Comment