Improve performance of numpy.nonzero

`numpy.count_nonzero`, `numpy.nonzero` and `numpy.flatnonzero` are expensively used within `scipy.sparse` ([compressed.py:616](https://github.com/scipy/scipy/blob/master/scipy/sparse/compressed.py#L616), [compressed.py:1180](https://github.com/scipy/scipy/blob/master/scipy/sparse/compressed.py#L1180), [base.py:218](https://github.com/scipy/scipy/blob/master/scipy/sparse/base.py#L218), [coo.py:539](https://github.com/scipy/scipy/blob/master/scipy/sparse/coo.py#L539), etc.)

Through this usage, as well through direct use ([measureimageintensity.py#251](https://github.com/CellProfiler/CellProfiler/blob/master/cellprofiler/modules/measureimageintensity.py#L251), [measureimageskeleton.py#252](https://github.com/CellProfiler/CellProfiler/blob/master/cellprofiler/modules/measureimageskeleton.py#L252)) the `numpy.nonzero` was seen high in the profile of AdvancedSegmentation benchmark of [CellProfiler](http://cellprofiler.org/examples/). 

`PyArray_NonZero` function extracts `nonzero` function from the dtype descriptor ([item_selection.c:#2185](https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/item_selection.c#L2185)), which has the signature `(dataptr, self)`, and returns a boolean indicating if the element is zero or not. 
This function is called within a loop, iterating over all elements of the array.

The function carries reference `self` to the `PyArrayObject` to detect is the array might be not-behaved (data layout is difference from the native to machine), and call `swap` function before making the test. 

The check if the array is behaved can be done once outside the loop, and a much more efficient loop, which is amendable to vectorization, can be written,

There is already a special case written for boolean arrays ([item_selection.c#2229](https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/item_selection.c#L2229)).

This is a suggestion to add another special case for well-behaved arrays (like native integers and native floating types).

An indicatation that at least 2X performance can be gained:

```
In [1]: import numpy as np

In [2]: x = np.random.randint(-5,5,size=10**6)

In [3]: %timeit np.nonzero(x)
5.89 ms ± 44.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# this is faster due to boole optimization despite the need to create intermediate array 
In [4]: %timeit np.nonzero(x != 0)
2.59 ms ± 220 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve performance of numpy.nonzero #11569

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Improve performance of numpy.nonzero #11569

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions