NumPy broadcasting vs. simple loop
I have been using NumPy for a while but there still are instances in which broadcasting is impenetrable to me. Please see the code below:
import numpy as np
np.random.seed(123)
# Number of measurements for the x variable
nx = 7
# Number of measurements for the y variable
ny = 11
# Number of items for which we run the simulation
nc = 23
# Fake some data
x = np.random.uniform(0, 1, size=(nx, ))
y = np.random.uniform(0, 1, size=(ny, ))
# histogram_2d represents the 2d frequency of the x, y measurements
histogram_2d = np.random.randint(0, 20, size=(nx, ny))
# c is the actual simulation results, size=(nx*ny, nc)
c = np.random.uniform(0, 9, size=(nx*ny, nc))
# Try broadcasting
c_3d = c.reshape((nc, nx, ny))
numpy_sum = (c_3d * histogram_2d).sum()
# Attempt possible replacement with a simple loop
partial_sum = 0.0
for i in range(nc):
c_2d = np.reshape(c[:, i], (nx, ny))
partial_sum += (c_2d * histogram_2d).sum()
print('Numpy broadcasting: ', numpy_sum)
print('Actual loop : ', partial_sum)
In my naivete, I was expecting the two approaches to give the same results (up to some multiple of machine precision). But on my system I get this:
Numpy broadcasting: 74331.4423599
Actual loop : 73599.8596346
As my ignorance is showing: given that histogram_2d
is a 2D array and c_3d
is a 3D array, I was simply thinking that NumPy would magically expand histogram_2d
with nc
copies of itself in the first axis and do the multiplication. But it appears I am not quite correct.
I would like to know how to replace the condensed, broadcasted multiplication + sum with a proper for loop - I am looking at some code in Fortran to do the same and this Fortran code:
hist_3d = spread(histogram_2d, 1, nc)
c_3d = reshape(c, [nc, nx, ny])
partial_sum = sum(c_3d*hist_3d)
Does not do what the NumPy broadcasting does... Which means I am doing something fundamentally wrong somewhere and/or my understanding of broadcasting is still very limited.
Comments
Post a Comment