Skip to content

Improve performance of numpy.nonzero #11569

@oleksandr-pavlyk

Description

@oleksandr-pavlyk

numpy.count_nonzero, numpy.nonzero and numpy.flatnonzero are expensively used within scipy.sparse (compressed.py:616, compressed.py:1180, base.py:218, coo.py:539, etc.)

Through this usage, as well through direct use (measureimageintensity.py#251, measureimageskeleton.py#252) the numpy.nonzero was seen high in the profile of AdvancedSegmentation benchmark of CellProfiler.

PyArray_NonZero function extracts nonzero function from the dtype descriptor (item_selection.c:#2185), which has the signature (dataptr, self), and returns a boolean indicating if the element is zero or not.
This function is called within a loop, iterating over all elements of the array.

The function carries reference self to the PyArrayObject to detect is the array might be not-behaved (data layout is difference from the native to machine), and call swap function before making the test.

The check if the array is behaved can be done once outside the loop, and a much more efficient loop, which is amendable to vectorization, can be written,

There is already a special case written for boolean arrays (item_selection.c#2229).

This is a suggestion to add another special case for well-behaved arrays (like native integers and native floating types).

An indicatation that at least 2X performance can be gained:

In [1]: import numpy as np

In [2]: x = np.random.randint(-5,5,size=10**6)

In [3]: %timeit np.nonzero(x)
5.89 ms ± 44.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# this is faster due to boole optimization despite the need to create intermediate array 
In [4]: %timeit np.nonzero(x != 0)
2.59 ms ± 220 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions