-
Notifications
You must be signed in to change notification settings - Fork 89
add MIEB results and rename model to pass tests #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When pointing embeddings-benchmark/mteb#2035 to this branch, it seems like MIEB results cannot be displayed due to "Number of parameters". |
...BAAI__bge-visualized-base/98db10b10d22620010d06f11733346e1c98c34aa/AROVisualAttribution.json
Show resolved
Hide resolved
results/BAAI__bge-visualized-base/98db10b10d22620010d06f11733346e1c98c34aa/model_meta.json
Outdated
Show resolved
Hide resolved
@gowitheflow-1998 @KennethEnevoldsen here's a screenshot of the LB, hacked to point to this branch. eng and lite versions were able to render as well. Cache needed to be wiped. |
there's a few task where the main metric was wrong when we implemented them and isn't matching with the paper. Let me double-check all tasks and get back. Might be a good idea to replace the scores in main metric with the actual main metrics before we merge I think |
Also seems like the performance v. model size plot need some model references. You can add these in:
|
Figured it out 👍 [update] Added a few models that ranked first from a few task types:
|
The performance per task type plot isn't showing though 🤔 says it only contains one task type when there are 8. |
hmm not sure why this is happening - @x-tabdeveloping do you have an idea? |
I'll have a look at it tomorrow |
@isaac-chung My guess would be it's cause of task_types = [
"BitextMining",
"Classification",
"MultilabelClassification",
"Clustering",
"PairClassification",
"Reranking",
"Retrieval",
"STS",
"Summarization",
# "InstructionRetrieval",
# Not displayed, because the scores are negative,
# doesn't work well with the radar chart.
"Speed",
] The reason I made this list was because instruction retrieval shows scores in the negatives and that doesn't really work with the radar chart. |
That's it. Thanks! It's working now. |
have fixed main metric issue by overwriting main scores with actual main metric scores; deleted previous incomplete Jina runs with a old version that only has a few task results. overwritten scores include:
|
@gowitheflow-1998 good stuff! Are we ready to merge? |
yeah, merged! adding @Muennighoff as co-author for running most of the results here! |
Does this have everything from https://github.com/embeddings-benchmark/tmp i.e. we can safely delete that repo? |
yeah! all results are here |
Fixes embeddings-benchmark/mteb#1823
Add MIEB results. The following models have been renamed to add org name (based on local test failures):
QuanSun
: https://huggingface.co/QuanSun/EVA-CLIP/tree/mainvoyageai
Related MTEB issue: embeddings-benchmark/mteb#2074
Checklist
make test
.make pre-push
.Adding a model checklist
mteb/models/
directory. Instruction to add a model can be found here in the following PR ____