2022-03-22

Core dimension error when running numba ufunc on dask array

I'm trying to run custom numba vectorized/ufunc functions in a lazy dask pipeline.

When I run the code below I get a ValueError: Core dimension 'm' consists of multiple chunks. I don't understand why m is considered a core dimension. Any idea how I can solve this issue?

import numpy as np
import dask.array as da
import numba
from numba import float64

# Define ufunc that directly takes a 3D array and mean reduce along axis 0
@numba.guvectorize([(float64[:,:,:], float64[:,:])], '(k,m,n)->(m,n)')
def reduce_mean(x, out):
    """Mean reduce a 3D array along the first dimension (axis 0)"""
    nrows = x.shape[0]
    for idx in range(x.shape[1]):
        for idy in range(x.shape[2]):
            col_sum = np.sum(x[:,idx,idy])
            out[idx,idy] = np.divide(col_sum, nrows)

# Apply ufunc on dask array
arr = da.random.random((10,200,200), chunks=(10,50,50)).astype(np.float64)
arr_reduced = reduce_mean(arr)
print(arr_reduced)


No comments:

Post a Comment