-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Description
numpy.count_nonzero
, numpy.nonzero
and numpy.flatnonzero
are expensively used within scipy.sparse
(compressed.py:616, compressed.py:1180, base.py:218, coo.py:539, etc.)
Through this usage, as well through direct use (measureimageintensity.py#251, measureimageskeleton.py#252) the numpy.nonzero
was seen high in the profile of AdvancedSegmentation benchmark of CellProfiler.
PyArray_NonZero
function extracts nonzero
function from the dtype descriptor (item_selection.c:#2185), which has the signature (dataptr, self)
, and returns a boolean indicating if the element is zero or not.
This function is called within a loop, iterating over all elements of the array.
The function carries reference self
to the PyArrayObject
to detect is the array might be not-behaved (data layout is difference from the native to machine), and call swap
function before making the test.
The check if the array is behaved can be done once outside the loop, and a much more efficient loop, which is amendable to vectorization, can be written,
There is already a special case written for boolean arrays (item_selection.c#2229).
This is a suggestion to add another special case for well-behaved arrays (like native integers and native floating types).
An indicatation that at least 2X performance can be gained:
In [1]: import numpy as np
In [2]: x = np.random.randint(-5,5,size=10**6)
In [3]: %timeit np.nonzero(x)
5.89 ms ± 44.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# this is faster due to boole optimization despite the need to create intermediate array
In [4]: %timeit np.nonzero(x != 0)
2.59 ms ± 220 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)