Skip to content

Conversation

zhichao-aws
Copy link
Contributor

@zhichao-aws zhichao-aws commented Jul 15, 2025

Add model meta data and custom wrapper for OpenSearch sparse encoding models https://huggingface.co/opensearch-project.

  • I have filled out the ModelMeta object to the extent possible
  • I have ensured that my model can be loaded using
    • mteb.get_model(model_name, revision) and
    • mteb.get_model_meta(model_name, revision)
  • I have tested the implementation works on a representative set of tasks.
  • The model is public, i.e. is available either as an API or the wieght are publicly avaiable to download

@Samoed Samoed changed the title [Model] Add OpenSearch inf-free sparse encoding models model: Add OpenSearch inf-free sparse encoding models Jul 15, 2025
@zhichao-aws
Copy link
Contributor Author

To load the sparse models, we need to use latest SentenceTransformers (>v5.0)

Should I update the requirements.txt? Or I should just keep the version unchanged ?

Could you please provide some guide here @Samoed

@Samoed Samoed requested a review from isaac-chung July 17, 2025 06:52
@isaac-chung
Copy link
Collaborator

I think there is still an open issue on handling STv5 prompts: #2896. So we should not alter requirements.txt yet until that is resolved.

@zhichao-aws
Copy link
Contributor Author

I think there is still an open issue on handling STv5 prompts: #2896. So we should not alter requirements.txt yet until that is resolved.

Got it. So should we merge the PR first? If we can't merge this PR, will it block we merging results to https://github.com/embeddings-benchmark/results ?

@isaac-chung
Copy link
Collaborator

Thanks for your work but we will not merge either PR until the linked issue is resolved. Every merged PR should be reproducible.

To load the sparse models, we need to use latest SentenceTransformers (>v5.0)

Based on what you just said, this PR will not be reproducible at the moment with the repo.

@isaac-chung
Copy link
Collaborator

In the meantime, would you be able to help with #2896? If not, @Samoed or I can take a look (I won't have time until tomorrow).

@zhichao-aws
Copy link
Contributor Author

In the meantime, would you be able to help with #2896? If not, @Samoed or I can take a look (I won't have time until tomorrow).

Thanks for clarification. But unfortunately I don't have too much bandwidth in future days to look at that issue :(

@isaac-chung
Copy link
Collaborator

Don't worry, @zhichao-aws , we should be able to carry this to the finish when the issue is resolved. Let's keep in touch here for updates.

@Samoed
Copy link
Member

Samoed commented Jul 17, 2025

In the meantime, would you be able to help with #2896? If not, @Samoed or I can take a look (I won't have time until tomorrow).

Yes, you're right. I will try to fix this

@isaac-chung
Copy link
Collaborator

We specify sentence_transformers>=3.0.0, and the latest tests already pull ST v5.0. The current test has STv4.0.2 cached so we can ignore that.

@isaac-chung isaac-chung enabled auto-merge (squash) July 20, 2025 21:02
@isaac-chung isaac-chung merged commit 5a868e3 into embeddings-benchmark:main Jul 20, 2025
9 of 10 checks passed
@Samoed
Copy link
Member

Samoed commented Jul 21, 2025

@isaac-chung FYI 4.0.2 is not cached version. This version is reqired by pylate https://github.com/lightonai/pylate/blob/84b1a0c111d28b25f2e00d04782364abc2dbf138/pyproject.toml#L18

@isaac-chung
Copy link
Collaborator

Oh? So this seems to contradict your comment here, or what am I missing?

@Samoed
Copy link
Member

Samoed commented Jul 21, 2025

Model loading download dependencies for multiple models

mteb/Makefile

Line 46 in 5a868e3

pip install ".[dev, pylate,gritlm,xformers,model2vec]"
(including pylate) and pylate requires sentence-transformers ==4.0.2. In tests we're using only dev and image dependencies and don't install pylate and we're using latest sentence-transformers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants