Skip to content

Conversation

generall
Copy link
Member

Fixes: #6735

Some details for the bug:

For computing IDF we were using number of available points as total number of docs and length of the posting list as a number documents with token.
In our implementation Posting List are immutable on delete (and in case of in-ram index, it is re-created on load), so it results in inconsistent numbers passed into IDF formula.

This PR uses number of indexed vectors from vector index, which is as immutable as posting-lists. So after deleting some points we still get consistent scores.

This comment was marked as resolved.

@generall generall requested a review from agourlay June 23, 2025 15:22
@generall generall merged commit 3d76eec into dev Jun 24, 2025
18 checks passed
@generall generall deleted the fix-idf-statistics-compute branch June 24, 2025 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants