Skip to content

Conversation

friday
Copy link
Member

@friday friday commented Oct 12, 2021

Replaces #824.

I wasn't planning to, but I reimplemented this for the v6 branch instead, with the major difference that it uses a new method get_matching_blocks() which only exists in this branch (because it doesn't behave exactly the same in some cases).

Was finally able to replace those last 30 lines of code to now to a much more sensible solution thanks to get_matching_blocks(), and also removed the condition #824 reintroduced.

The character weighting is much smaller, and the new code doesn't bump up everything to 100% like the old one did with very short queries.

The highlighter and matcher uses the same implementation now (even the same function call, since it's memoized). Before this, the highligher used a completely different implementation and sometimes would display misleading results.

Might need some need tuning still, but I'm happy with the results, because getting the matchmaking blocks means we can understand it better and score it better, with less code duplication.

2021-10-12_14-55

@friday friday force-pushed the v6-fuzzy-rework branch 2 times, most recently from 1518008 to 7d7a807 Compare October 13, 2021 05:10
@troycurtisjr
Copy link
Member

Oh yeah this is really nice. The implementation is much improved from a code/logic standpoint, and the results are right on. I haven't been able to find any cases yet where the results did not line up with my expectations! 👍 👍

As far as I'm concerned this is ready to merge into v6.

@troycurtisjr troycurtisjr self-requested a review October 16, 2021 13:15
@friday friday merged commit 6efc86a into v6 Oct 16, 2021
@friday friday deleted the v6-fuzzy-rework branch October 16, 2021 15:47
@friday
Copy link
Member Author

friday commented Oct 16, 2021

Oh yeah this is really nice. The implementation is much improved from a code/logic standpoint, and the results are right on. I haven't been able to find any cases yet where the results did not line up with my expectations! +1 +1

As far as I'm concerned this is ready to merge into v6.

I can think of a couple of cases, but I think the results are generally good enough not to have to handle them.

  • The code that checks if there's a word boundary could maybe include other white-space characters or all non-alphanumericals.
  • With the example "FiWeBr" matching "Firefox Web Browser" it actually matches "Firefox Web Browser" now, so the code will incorrectly give it a slight penalty for the last letter failing the word boundary condition. But this is how the Levenstein library matches it, and it's way more efficient than any code we could write. And if it's just one letter the penalty is so small it won't matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants