ENH: Improve performance of np.count_nonzero for float arrays #27523

eendebakpt · 2024-10-07T11:48:35Z

Improve performance of np.count_nonzero for float arays. The PR is similar to #18183, but uses a simple loop instead of SIMD instructions.

The template in arraytypes.c.src has more types than is strictly required (the integers have their own fast path), these can be removed if needed.

Quick benchmark:

import sys
import timeit
import numpy as np

b = np.arange(10_000).astype(bool)
x = b.astype(int)
f = b.astype(np.float64)
f32 = b.astype(np.float32)
n=10

for A in [b, x, f, f32]:
    t = timeit.timeit('np.count_nonzero(A)', globals=globals(), number=n)
    print(f'{A.dtype}: {1e6*t/n:.1f} [us]')

Main:

bool: 1.5 [us]
int64: 5.1 [us]
float64: 24.8 [us]
float32: 28.0 [us]

PR

bool: 1.5 [us]
int64: 5.1 [us]
float64: 7.8 [us]
float32: 7.7 [us]

charris · 2024-10-08T18:21:25Z

close/reopen for testing.

eendebakpt · 2025-02-10T09:14:10Z

@tylerjereddy Would you me able to review this PR? (there is also #27519 which is related)

tylerjereddy

Could you add asv benchmarks to guard the performance improvement? When I run asv continuous -e -b "time_count_nonzero.*" main count_nonzero_float locally, I don't see any sign of improvements, probably because I don't see any benchmark parametrization over float types, so maybe we could include them.

The floating code path might be less common than the integer code paths since checking if a float is exactly zero may be a bit more annoying? I suppose that might be a point of potential resistance from the other core devs? Though the shims seem fairly small-ish.

tylerjereddy · 2025-02-11T01:10:17Z

numpy/_core/src/multiarray/arraytypes.c.src

+    switch(dtype_num) {
+        /**begin repeat
+         *
+         * #dtype = npy_bool, npy_byte, npy_byte, npy_uint16, npy_int16, npy_uint32, npy_int32, npy_uint64, npy_int64, npy_float, npy_double#


I think you can purge this line since there is no substitution target for dtype in the repeated block? At least when I do that locally a clean build from source still passes the full suite.

seberg

Changes look good to me, thanks.

Please add a test for byte swapped floats (or point to one if it is fine). The code doesn't look like it deals with that correctly (it is completely fine to just take the slow path for all byte-swapped dtypes, although it is OK for integers of course).

numpy/_core/src/multiarray/item_selection.c

seberg · 2025-02-11T10:14:19Z

numpy/_core/src/multiarray/arraytypes.c.src

+        /**end repeat**/
+    }
+    return -1;
+}


I suppose the only reason for putting it here is that the other file isn't c.src (or .cpp for that matter).

(OK with me)

seberg · 2025-02-11T10:15:43Z

numpy/_core/src/multiarray/arraytypes.c.src

+}
+/**end repeat**/
+
+npy_intp


Suggested change

npy_intp

NPY_NO_EXPORT npy_intp

not sure we fixed that... Once that happens we should just remove this throughout (rather than stop adding it), IMO.

seberg · 2025-02-11T11:10:18Z

I forgot to point out that the important thing about byte swapping is a byte-swapped -0.0 not a 0, since it actually is all zero.

Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>

seberg · 2025-02-21T08:20:03Z

numpy/_core/src/multiarray/item_selection.c

@@ -2724,6 +2724,15 @@ PyArray_CountNonzero(PyArrayObject *self)
            }
        }
        else {
+            /* Special low-overhead version specific to the float types (and some others) */
+            if (PyArray_ISNOTSWAPPED(self)) {


Just realized (for reasons): Basically every time you check something like this you also need to check whether the array is aligned. The above code does that, as you can see.

(Please also check in the other PR. One way to test in CI could be to use a "?d" structured dtype, another to just view with a byte shift. The structured dtype just doesn't check if there is a contiguous only special loop.)

Do you mean we should replace PyArray_ISNOTSWAPPED(self) with PyArray_ISALIGNED(self) && PyArray_ISNOTSWAPPED(self)? What would be the reason for this check?

Dereferencing an unaligned pointer like *(float64 *)ptr will just kill with a SIGBUS or something on come hardware (it may be slow on others).
It isn't a problem on typical hardware...

Dunno, I think some snaitizer (UBSAN?) would find it, otherwise the typical thing is that debian opens an issue eventually because there is a SIGBUS on sparc or so.

For reference: the behavior for unaligned dereferencing is indeed unaligned. See C17 [ISO/IEC 9899:2018] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf section 6.3.2.7

seberg · 2025-02-24T11:39:45Z

numpy/_core/tests/test_numeric.py

+
+    def test_count_nonzero_non_aligned_array(self):
+        sz = 64
+        b = np.zeros(64 + 1).view(np.int8)[1:-(np.intp(0).itemsize - 1)]


intp itemsize != float64 itemsize. Just hardcode the 8/7 bytes, seems easier to read. Can also create the int8/uint8 array initially directly.

seberg

OK, let's put this in, thanks. Looks right now, and clean enough. Thanks for the new tests also, which are important!

I still feel it would be nice to merge this with the bool/int "simple path" above, although it is slightly different I suppose (but it may actually be faster for the trivially iterable cases, although it may be that we are calling sum() at least for bools there for exactly this optimization).

ENH: Improve performance of np.count_nonzero for float arrays

6525473

github-actions bot added the 01 - Enhancement label Oct 7, 2024

eendebakpt marked this pull request as draft October 7, 2024 11:49

Merge branch 'main' into count_nonzero_float

2e03df2

eendebakpt marked this pull request as ready for review October 7, 2024 17:05

charris closed this Oct 8, 2024

charris reopened this Oct 8, 2024

eendebakpt mentioned this pull request Oct 11, 2024

ENH: Improve performance of numpy.nonzero for 1D/2D contiguous arrays #27519

Open

eendebakpt requested a review from seberg October 31, 2024 20:10

eendebakpt added 2 commits November 25, 2024 13:38

Merge branch 'main' into count_nonzero_float

d6a3c7d

Merge branch 'main' into count_nonzero_float

5aa8470

tylerjereddy reviewed Feb 11, 2025

View reviewed changes

eendebakpt added 3 commits February 11, 2025 10:55

add asv benchmark for float

605bd03

Merge branch 'main' into count_nonzero_float

514fc7b

review comments

4b1f103

seberg requested changes Feb 11, 2025

View reviewed changes

eendebakpt and others added 6 commits February 19, 2025 10:35

Update numpy/_core/src/multiarray/item_selection.c

e646daa

Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>

add tests

402c542

lint

b8d952f

lint

62a05a4

add guard

1683aa6

add guard

9a4da3c

seberg reviewed Feb 21, 2025

View reviewed changes

eendebakpt added 3 commits February 23, 2025 21:24

Merge branch 'main' into count_nonzero_float

46db4ae

add check for fast path with array that is not aligned

a4b83c9

lint

4951221

seberg reviewed Feb 24, 2025

View reviewed changes

eendebakpt added 5 commits February 24, 2025 15:57

refactor test

640cc5a

add github ref

9cfd645

add check on alignment

461bf8c

Merge branch 'main' into count_nonzero_float

360405c

Merge branch 'main' into count_nonzero_float

ce32637

seberg approved these changes Apr 22, 2025

View reviewed changes

seberg merged commit e68ac76 into numpy:main Apr 22, 2025
71 of 72 checks passed

seberg added this to the 2.3.0 release milestone Apr 22, 2025

Uh oh!

ENH: Improve performance of np.count_nonzero for float arrays #27523

ENH: Improve performance of np.count_nonzero for float arrays #27523

Uh oh!

Conversation

eendebakpt commented Oct 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charris commented Oct 8, 2024

Uh oh!

eendebakpt commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tylerjereddy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg commented Feb 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

eendebakpt commented Oct 7, 2024 •

edited

Loading

eendebakpt commented Feb 10, 2025 •

edited

Loading