Pythonic way of getting all percentage changes in a numpy array?

I'm trying to get the distribution of percentage changes between each entry in an array and all of its successive entries. I can do this inefficiently with the following code:

arr = np.add(np.random.rand(3000,1),1)[:,-1]
def pct_changes(arr):
    arr_periods = np.linspace(start=0, stop=len(arr)-1, num=len(arr))
    arr_exp = np.divide(1, arr_periods)
    arr_norm = np.divide(arr, arr[0]) #[:,-1]
    arr_growth = np.power(arr_norm, arr_exp)
    return arr_growth[1:]

def all_rates(arr, ignore_periods = 1):
    arr_out = np.array([])
    for i in range(len(arr) - ignore_periods):
        arr_sub = arr[i:]
        arr_out = np.concatenate([arr_out, pct_changes(arr_sub)[ignore_periods:]])
    return arr_out

all_rates_arr =  np.subtract(all_rates(arr),1)
np.percentile(all_rates_arr, np.linspace(0,100,101))

I'm still ramping fully into Python, I think the most efficient solution would be to make a square matrix and then only take points above the diagonal (since order doesn't matter). I am not sure the best way to do that in code but more importantly is there a better way to do this? I feel like I'm missing something elegant.

Edit: Adding example input and output, this is time series of users on a platform over a period of time. (Below arr is a pedantic example with known solution)

# Array of 10% changes (e.g. 1.1 ** idx)
arr = [1, 1.1, 1.21, 1.331]

# Get additional arrays for each entry except the last
arr_1 = [1.1, 1.21, 1.331]
arr_2 = [1.21, 1.331]

# Divide each array by its position 0
arr = np.divide(arr, arr[0])
arr_1 = np.divide(arr_1, arr_1[0])
arr_2 = np.divide(arr_2, arr_2[0])

# We have:
# arr = [1, 1.1, 1.21, 1.331]
# arr_1 = [1, 1.1, 1.21]
# arr_2 = [1, 1.1]

# Raise to 1/idx power to get percentage change per period
arr = np.power(arr, 
               np.divide(1, 
                         np.linspace(0,
                                     len(arr)-1,
                                     len(arr)
                                     )
                          )
               )
#... perform for other arrays
# Within float precision should be:
# arr = [Inf, 1.1, 1.1, 1.1]
# arr_1 = [Inf, 1.1, 1.1]
# arr_2 = [Inf, 1.1] 

# Finally concatenate arrays and remove first elements
arr_out = np.concatenate(arr[1:], arr_1[1:], arr_2[1:])

# arr_out = [1.1, 1.1, 1.1, 1.1, 1.1, 1.1]

Edit 2:

Fastest code I could come up with. Instead of performing a loop I create a matrix, perform all calculations at the same time, then get the upper right triangle of the matrix to get the results I care about.

This can probably be improved if there's a way to skip the operations on elements that will be discarded...

Hope it helps someone.

def growth_percentiles_mat(arr, n=21, margin = 0.05, ignore_periods = 252):
    mat_arr = (arr[np.newaxis, :] / arr[:, np.newaxis])[:, :, -1]
    l = mat_arr.shape[0]
    arr_l = np.linspace(0,l-1,l)
    mat_l = arr_l[np.newaxis, :] - arr_l[:, np.newaxis]
    mat_pow = np.divide(1, mat_l)
    mat_perc = np.power(mat_arr, mat_pow)
    # Segment at higher diagonal for more ignore periods
    arr_fin = np.matrix.flatten(np.triu(mat_perc,1+ignore_periods)) 
    arr_fin = arr_fin[arr_fin > 0]
    return np.percentile(arr_fin, np.linspace(margin,100-margin,n))


Comments

Popular posts from this blog

Spring Elasticsearch Operations

Network Error and Timeout on Authorize.net JS

Object oriented programming concepts (OOPs)