Skip to content

binned_statistic_dd does not respect masked arrays #12898

@Jwink3101

Description

@Jwink3101

I guess it is debatable as to whether this is a feature request (i.e. it should support masked array) or a bug (it doesn't).

Reproducing code example:

Simple example:

rnd = np.random.RandomState(seed=54)
x = rnd.uniform(high=2*np.pi,size=1000)
r2 = rnd.normal(size=len(x))
f = np.sin(x) + 0.1 * r2

# To demonstrate, make some of f NaNs based on r2 then use a masked array.
# We could just make a masked array but I want to *know* it's not working
f[np.abs(r2)>2] = np.nan

print(np.sum(np.isnan(f)))

ff = np.ma.masked_invalid(f)

stat,*_ = scipy.stats.binned_statistic(x,ff) # binned_statistic calls binned_statistic_dd
print(stat)

Of course, you could do:

stat2,*_ = scipy.stats.binned_statistic(x,ff,statistic=np.nanmean) 

but it is slower. In Jupyter:

%timeit scipy.stats.binned_statistic(x,ff)
%timeit scipy.stats.binned_statistic(x,ff,statistic=np.nanmean) 

193 µs ± 5.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
669 µs ± 26.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Around 350% slower (with the same ~ 4% variation)

Error message:

N/A

Scipy/Numpy/Python version information:

print(scipy.__version__, numpy.__version__, sys.version_info)
1.5.0 1.18.5 sys.version_info(major=3, minor=8, micro=3, releaselevel='final', serial=0)

(Basically the latest in Anaconda I think)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions