Skip to content

Add regex-based alternative to full-text search #1098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 4, 2025
Merged

Conversation

ml-evs
Copy link
Member

@ml-evs ml-evs commented Apr 4, 2025

After noodling around in #993, I think the conclusion is that its not worth maintaining a potentially large separate custom index for FTS that works with only a few characters. Instead, as a bit of a prototype, this PR adds a simple regex-based search that can triggered in the item select by prefixing the query with "%". This seems to work fairly well at our typical database sizes, but might need some optimisations. For now it also only uses the fields that are currently part of the free-text index, but this could be expanded (and we can add real indices over these fields if performance is bad).

Closes #679?

@ml-evs ml-evs added the API For issues/PRs pertaining to the API label Apr 4, 2025
Copy link

codecov bot commented Apr 4, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.25%. Comparing base (603b387) to head (6747b51).
Report is 120 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1098      +/-   ##
==========================================
+ Coverage   70.18%   70.25%   +0.07%     
==========================================
  Files          63       63              
  Lines        4119     4129      +10     
==========================================
+ Hits         2891     2901      +10     
  Misses       1228     1228              
Files with missing lines Coverage Δ
pydatalab/src/pydatalab/routes/v0_1/items.py 83.38% <100.00%> (+0.53%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

cypress bot commented Apr 4, 2025

datalab    Run #3037

Run Properties:  status check passed Passed #3037  •  git commit aa3fc8e411 ℹ️: Merge 09d5f1ad6c8e69e830a645f90ad9671a2a3085ee into 511eb211630da44403cf7e43e5a9...
Project datalab
Branch Review ml-evs/regex-search
Run status status check passed Passed #3037
Run duration 07m 15s
Commit git commit aa3fc8e411 ℹ️: Merge 09d5f1ad6c8e69e830a645f90ad9671a2a3085ee into 511eb211630da44403cf7e43e5a9...
Committer Matthew Evans
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 471
View all changes introduced in this branch ↗︎

@ml-evs
Copy link
Member Author

ml-evs commented Apr 4, 2025

Going to merge this as a bit of a hidden feature, want to test it out on some real deployments before making it the default

@ml-evs ml-evs force-pushed the ml-evs/regex-search branch from 09d5f1a to 6747b51 Compare April 4, 2025 23:27
@ml-evs ml-evs merged commit 24569ca into main Apr 4, 2025
8 of 9 checks passed
@ml-evs ml-evs deleted the ml-evs/regex-search branch April 4, 2025 23:28
BenjaminCharmes pushed a commit that referenced this pull request Apr 16, 2025
* Add regex-based search triggered with '%' at start of query

Add very simple regexp search test

* Tweak test case and handle quotes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API For issues/PRs pertaining to the API usability
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Free text search could be improved
1 participant