Compare two columns based on last N rows in a pandas DataFrame

By Ritesh Sahu - March 10, 2022

I want to groupby "ts_code" and calculate percentage between one column max and min value from another column after max based on last N rows for each group. Specifically,

df

ts_code high low
0   A   20  10
1   A   30  5
2   A   40  20
3   A   50  10
4   A   20  30
5   B   20  10
6   B   30  5
7   B   40  20
8   B   50  10
9   B   20  30

Goal

Below is my expected result

   ts_code  high low l3_high_low_pct_chg    l4_high_low_pct_chg
    0   A   20  10  NA  NA
    1   A   30  5   NA  NA
    2   A   40  20  0.5 NA
    3   A   50  10  0.8 0.8
    4   A   20  30  0.8 0.8
    5   B   50  10  NA  NA
    6   B   30  5   NA  NA
    7   B   40  20  0.9 NA
    8   B   10  10  0.75    0.9
    9   B   20  30  0.75    0.75

ln_high_low_pct_chg(such as l3_high_low_pct_chg)= 1-(the min value of the low column after the peak)/(the max value of high column),on last N rows for each group and each row.

Try and problem

df['l3_highest']=df.groupby('ts_code')['high'].transform(lambda x: x.rolling(3).max())
df['l3_lowest']=df.groupby('ts_code')['low'].transform(lambda x: x.rolling(3).min())
df['l3_high_low_pct_chg']=1-df['l3_lowest']/df['l3_highest']

But it fails such that for second row, the l3_lowest would be 5 not 20. I don't know how to calculate percentage after peak.

For last 4 rows, at index=8, low=10,high=50,low=5, l4_high_low_pct_chg=0.9 , at index=9, high=40, low=10, l4_high_low_pct_chg=0.75

Search This Blog

Theprogrammersfirst | A technical portal.

Compare two columns based on last N rows in a pandas DataFrame

Comments

Post a Comment

Popular posts from this blog

Today Walkin 14th-Sept

Network Error and Timeout on Authorize.net JS

Spring Elasticsearch Operations