Skip to content

Conversation

ajfriend
Copy link
Collaborator

@ajfriend ajfriend commented Feb 17, 2025

Taking over from #496.

New isValidCell should be faster in all cases. Relies on bit manipulation instead of loops.

In the last correctness check, we use compiler intrinsics, so we'll want to verify that that approach works on all platforms. We can always fall back to a loop, and that would still be faster (see the benchmarks plot below). However, we need to ensure that we have the right test to evaluate which approaches are available on each platform. I'm not currently sure how to do that in an exhaustive way.

@coveralls
Copy link

coveralls commented Feb 17, 2025

Coverage Status

coverage: 98.786% (+0.002%) from 98.784%
when pulling 52c7146 on ajfriend:new_isValidCell
into 42450ce on uber:master.

@ajfriend ajfriend marked this pull request as draft February 18, 2025 04:06
@ajfriend
Copy link
Collaborator Author

OK, current approach breaks some rules:

/home/runner/work/h3/h3/src/h3lib/lib/h3Index.c: In function ‘_isValidCell_const’:
/home/runner/work/h3/h3/src/h3lib/lib/h3Index.c:297:14: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]

@ajfriend
Copy link
Collaborator Author

Speedup results on my M3 mac are promising. This is for 8_14_null_100.

  • old is the original algorithm
  • final is the final algorithm using intrinsics
  • fallback is the final algorithm, but will the fallback for when intrinsics are not availab.e
image

Speedups, comparing the fastest run for each name and algo:

┌───────────────┬─────────┬─────────┬─────────┐
│     name      │  algo1  │  algo2  │ speedup │
│    varchar    │ varchar │ varchar │ double  │
├───────────────┼─────────┼─────────┼─────────┤
│ 2_8           │ old     │ final   │     6.4 │
│ 8_14          │ old     │ final   │     9.1 │
│ 8_14_null_10  │ old     │ final   │     7.8 │
│ 8_14_null_100 │ old     │ final   │     8.8 │
│ 8_14_null_2   │ old     │ final   │     5.0 │
└───────────────┴─────────┴─────────┴─────────┘

@ajfriend
Copy link
Collaborator Author

Latest benchmarks of final algo:

┌───────────────┬──────────────┬───────┐
│     name      │ microseconds │ runs  │
│    varchar    │    double    │ int64 │
├───────────────┼──────────────┼───────┤
│ 2_8           │      145.976 │   500 │
│ 8_14          │      145.646 │   500 │
│ 8_14_null_10  │      157.995 │   500 │
│ 8_14_null_100 │      149.786 │   500 │
│ 8_14_null_2   │      145.231 │   500 │
└───────────────┴──────────────┴───────┘

@ajfriend ajfriend marked this pull request as ready for review February 22, 2025 20:47
@ajfriend ajfriend mentioned this pull request Feb 22, 2025
@ajfriend ajfriend changed the title [wip] new fast isValidCell new fast isValidCell Feb 22, 2025
@ajfriend
Copy link
Collaborator Author

ajfriend commented Feb 22, 2025

TODO: We test 32 bit windows in CI. Should we do the same for mac and linux?

@isaacbrodsky
Copy link
Collaborator

TODO: We test 32 bit windows in CI. Should we do the same for mac and linux?

I don't think there is an equivalent Mac 32-bit.
Linux we can test a x86 or ARM 32-bit configuration

[4] = 1, [14] = 1, [24] = 1, [38] = 1, [49] = 1, [58] = 1,
[63] = 1, [72] = 1, [83] = 1, [97] = 1, [107] = 1, [117] = 1};

if (isBaseCellPentagonArr[base_cell]) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to use (perhaps an inlined version of) _isBaseCellPentagon?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll run some benchmarks and see if there's any difference.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely slower with the naive use of _isBaseCellPentagon. Test setup in the latest commit:

image

I'll figure out the right extern and inline incantation and try that next.

Another alternative might be to just expose the BaseCellData directly.

More generally for the H3 lib as a whole, I'm not sure if there's a benefit to using a struct of arrays vs an array of structs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think doing this will be a little involved, so I'm proposing doing it later :)

#984

@ajfriend
Copy link
Collaborator Author

ajfriend commented Feb 25, 2025

TODO: We test 32 bit windows in CI. Should we do the same for mac and linux?

I don't think there is an equivalent Mac 32-bit. Linux we can test a x86 or ARM 32-bit configuration

Think that's a blocker for this PR, or can I create a separate task/PR to add that to CI? @isaacbrodsky

@isaacbrodsky
Copy link
Collaborator

isaacbrodsky commented Feb 25, 2025

TODO: We test 32 bit windows in CI. Should we do the same for mac and linux?

I don't think there is an equivalent Mac 32-bit. Linux we can test a x86 or ARM 32-bit configuration

Think that's a blocker for this PR, or can I create a separate task/PR to add that to CI? @isaacbrodsky

I don't think it's a blocker, but we could certainly merge it in first.

My sense is Linux x86/i*86 is much less common today and probably not an architecture we need to be too concerned with testing.

I'll open a PR to include Linux arm64. I don't think Github provides Linux arm32 runners, at least as standard: https://docs.github.com/en/actions/using-github-hosted-runners/using-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories

edit: #974

@ajfriend
Copy link
Collaborator Author

ajfriend commented Mar 6, 2025

Issues trying to inline:

image image

@dfellis
Copy link
Collaborator

dfellis commented Mar 6, 2025

I'm not super familiar with inline functions in C, but going by this StackOverflow answer you need inline int ... in the header, and extern int ... in the implementation.

@ajfriend
Copy link
Collaborator Author

ajfriend commented Mar 7, 2025

I'm not super familiar with inline functions in C, but going by this StackOverflow answer you need inline int ... in the header, and extern int ... in the implementation.

Nice. That works for me locally, but with some errors. Let's see now it does in CI.

@ajfriend
Copy link
Collaborator Author

ajfriend commented Mar 9, 2025

Note the follow-up: #984

@ajfriend ajfriend merged commit 306a7cb into uber:master Mar 10, 2025
42 checks passed
@ajfriend ajfriend deleted the new_isValidCell branch March 10, 2025 01:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants