2021-12-02

Vectorize for loop and return x day high and low

Overview

For each row of a dataframe I want to calculate the x day high and low.

An x day high is higher than previous x days. An x day low is lower than previous x days.

The for loop is explained in further detail in this post

Update:

Answer by @mozway below completes in around 20 seconds with dataset containing 18k rows. Can this be improved with numpy with broadcasting etc?

Example

2020-03-20 has an x_day_low value of 1 as it is lower than the previous day.

2020-03-27 has an x_day_high value of 8 as it is higher than the previous 8 days.

See desired output and test code below which is calculated with a for loop in the findHighLow function. How would I vectorize findHighLow as the actual dataframe is somewhat larger.

Test data

def genMockDataFrame(days,startPrice,colName,startDate,seed=None): 
   
    periods = days*24
    np.random.seed(seed)
    steps = np.random.normal(loc=0, scale=0.0018, size=periods)
    steps[0]=0
    P = startPrice+np.cumsum(steps)
    P = [round(i,4) for i in P]

    fxDF = pd.DataFrame({ 
        'ticker':np.repeat( [colName], periods ),
        'date':np.tile( pd.date_range(startDate, periods=periods, freq='H'), 1 ),
        'price':(P)})
    fxDF.index = pd.to_datetime(fxDF.date)
    fxDF = fxDF.price.resample('D').ohlc()
    fxDF.columns = [i.title() for i in fxDF.columns]
    return fxDF

#rows set to 15 for minimal example but actual dataframe contains around 18000 rows.
number_of_rows = 15    

df = genMockDataFrame(number_of_rows,1.1904,'tttmmm','19/3/2020',seed=157)

def findHighLow (df):

    df['x_day_high'] = 0
    df['x_day_low'] = 0

    for n in reversed(range(len(df['High']))):
        for i in reversed(range(n)):
                if df['High'][n] > df['High'][i]:
                    df['x_day_high'][n] = n - i
                else: break

    for n in reversed(range(len(df['Low']))):
        for i in reversed(range(n)):
                if df['Low'][n] < df['Low'][i]:
                    df['x_day_low'][n] = n - i
                else: break
    return df

df = findHighLow (df)

Desired output should match this:

df[["High","Low","x_day_high","x_day_low"]]

             High   Low x_day_high  x_day_low
date                
2020-03-19  1.1937  1.1832  0       0
2020-03-20  1.1879  1.1769  0       1
2020-03-21  1.1767  1.1662  0       2
2020-03-22  1.1721  1.1611  0       3
2020-03-23  1.1819  1.1690  2       0
2020-03-24  1.1928  1.1807  4       0
2020-03-25  1.1939  1.1864  6       0
2020-03-26  1.2141  1.1964  7       0
2020-03-27  1.2144  1.2039  8       0
2020-03-28  1.2099  1.2018  0       1
2020-03-29  1.2033  1.1853  0       4
2020-03-30  1.1887  1.1806  0       6
2020-03-31  1.1972  1.1873  1       0 
2020-04-01  1.1997  1.1914  2       0
2020-04-02  1.1924  1.1781  0       9


from Recent Questions - Stack Overflow https://ift.tt/3cTE8lM
https://ift.tt/eA8V8J

No comments:

Post a Comment