Pythonic way of getting all percentage changes in a numpy array?
I'm trying to get the distribution of percentage changes between each entry in an array and all of its successive entries. I can do this inefficiently with the following code:
arr = np.add(np.random.rand(3000,1),1)[:,-1]
def pct_changes(arr):
arr_periods = np.linspace(start=0, stop=len(arr)-1, num=len(arr))
arr_exp = np.divide(1, arr_periods)
arr_norm = np.divide(arr, arr[0]) #[:,-1]
arr_growth = np.power(arr_norm, arr_exp)
return arr_growth[1:]
def all_rates(arr, ignore_periods = 1):
arr_out = np.array([])
for i in range(len(arr) - ignore_periods):
arr_sub = arr[i:]
arr_out = np.concatenate([arr_out, pct_changes(arr_sub)[ignore_periods:]])
return arr_out
all_rates_arr = np.subtract(all_rates(arr),1)
np.percentile(all_rates_arr, np.linspace(0,100,101))
I'm still ramping fully into Python, I think the most efficient solution would be to make a square matrix and then only take points above the diagonal (since order doesn't matter). I am not sure the best way to do that in code but more importantly is there a better way to do this? I feel like I'm missing something elegant.
Edit: Adding example input and output, this is time series of users on a platform over a period of time. (Below arr is a pedantic example with known solution)
# Array of 10% changes (e.g. 1.1 ** idx)
arr = [1, 1.1, 1.21, 1.331]
# Get additional arrays for each entry except the last
arr_1 = [1.1, 1.21, 1.331]
arr_2 = [1.21, 1.331]
# Divide each array by its position 0
arr = np.divide(arr, arr[0])
arr_1 = np.divide(arr_1, arr_1[0])
arr_2 = np.divide(arr_2, arr_2[0])
# We have:
# arr = [1, 1.1, 1.21, 1.331]
# arr_1 = [1, 1.1, 1.21]
# arr_2 = [1, 1.1]
# Raise to 1/idx power to get percentage change per period
arr = np.power(arr,
np.divide(1,
np.linspace(0,
len(arr)-1,
len(arr)
)
)
)
#... perform for other arrays
# Within float precision should be:
# arr = [Inf, 1.1, 1.1, 1.1]
# arr_1 = [Inf, 1.1, 1.1]
# arr_2 = [Inf, 1.1]
# Finally concatenate arrays and remove first elements
arr_out = np.concatenate(arr[1:], arr_1[1:], arr_2[1:])
# arr_out = [1.1, 1.1, 1.1, 1.1, 1.1, 1.1]
Edit 2:
Fastest code I could come up with. Instead of performing a loop I create a matrix, perform all calculations at the same time, then get the upper right triangle of the matrix to get the results I care about.
This can probably be improved if there's a way to skip the operations on elements that will be discarded...
Hope it helps someone.
def growth_percentiles_mat(arr, n=21, margin = 0.05, ignore_periods = 252):
mat_arr = (arr[np.newaxis, :] / arr[:, np.newaxis])[:, :, -1]
l = mat_arr.shape[0]
arr_l = np.linspace(0,l-1,l)
mat_l = arr_l[np.newaxis, :] - arr_l[:, np.newaxis]
mat_pow = np.divide(1, mat_l)
mat_perc = np.power(mat_arr, mat_pow)
# Segment at higher diagonal for more ignore periods
arr_fin = np.matrix.flatten(np.triu(mat_perc,1+ignore_periods))
arr_fin = arr_fin[arr_fin > 0]
return np.percentile(arr_fin, np.linspace(margin,100-margin,n))
Comments
Post a Comment