NumPy broadcasting vs. simple loop

By Ritesh Sahu - June 15, 2022

I have been using NumPy for a while but there still are instances in which broadcasting is impenetrable to me. Please see the code below:

import numpy as np

np.random.seed(123)

# Number of measurements for the x variable
nx = 7
# Number of measurements for the y variable
ny = 11
# Number of items for which we run the simulation
nc = 23

# Fake some data
x = np.random.uniform(0, 1, size=(nx, ))
y = np.random.uniform(0, 1, size=(ny, ))

# histogram_2d represents the 2d frequency of the x, y measurements
histogram_2d = np.random.randint(0, 20, size=(nx, ny))

# c is the actual simulation results, size=(nx*ny, nc)
c = np.random.uniform(0, 9, size=(nx*ny, nc))

# Try broadcasting
c_3d = c.reshape((nc, nx, ny))
numpy_sum = (c_3d * histogram_2d).sum()

# Attempt possible replacement with a simple loop
partial_sum = 0.0
for i in range(nc):
    c_2d = np.reshape(c[:, i], (nx, ny))
    partial_sum += (c_2d * histogram_2d).sum()
    
print('Numpy broadcasting:  ', numpy_sum)
print('Actual loop       :  ', partial_sum)

In my naivete, I was expecting the two approaches to give the same results (up to some multiple of machine precision). But on my system I get this:

Numpy broadcasting:   74331.4423599
Actual loop       :   73599.8596346

As my ignorance is showing: given that histogram_2d is a 2D array and c_3d is a 3D array, I was simply thinking that NumPy would magically expand histogram_2d with nc copies of itself in the first axis and do the multiplication. But it appears I am not quite correct.

I would like to know how to replace the condensed, broadcasted multiplication + sum with a proper for loop - I am looking at some code in Fortran to do the same and this Fortran code:

hist_3d = spread(histogram_2d, 1, nc)
c_3d = reshape(c, [nc, nx, ny])
partial_sum = sum(c_3d*hist_3d)

Does not do what the NumPy broadcasting does... Which means I am doing something fundamentally wrong somewhere and/or my understanding of broadcasting is still very limited.

Search This Blog

Theprogrammersfirst | A technical portal.

NumPy broadcasting vs. simple loop

Comments

Post a Comment

Popular posts from this blog

Spring Elasticsearch Operations

Hibernate Search - Elasticsearch with JSON manipulation

Today Walkin 14th-Sept