Add FaMTEB (Farsi/Persian Text Embedding Benchmark) #1843

mehran-sarmadi · 2025-01-20T21:30:11Z

We are a research team from Sharif University of Technology and MCINext Company developing a text embedding benchmark for the Persian language based on MTEB. So far, we have gathered around 63 datasets spanning 7 tasks (Classification, Clustering, Pair Classification, Reranking, Retrieval, STS, and Summary Retrieval), including a mix of existing, translated, and newly generated datasets. Notably, we are introducing the Summary Retrieval task for the first time, which focuses on identifying the correct summary of a paragraph from a set of candidates. We have also evaluated several Persian language models and text embeddings that support Persian for this benchmark.

We also open related PR for the results and leaderboard tab, and we are finalizing a paper on this work, which will be published in the near future.

Checklist

Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Adding datasets checklist

Reason for dataset addition: ...

I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- intfloat/multilingual-e5-small
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
I have filled out the metadata object in the dataset file (find documentation on it here).
Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

mteb/tasks/SummaryRetrieval/fas/FaMTEBSummaryRetrieval.py

mteb/evaluation/evaluators/SummaryRetrievalEvaluator.py

Samoed

Great addition! Can you add mock task of AbsTaskSummaryRetrieval task to https://github.com/embeddings-benchmark/mteb/blob/main/tests/test_benchmark/mock_tasks.py?

mteb/abstasks/AbsTaskSummaryRetrieval.py

mteb/tasks/Classification/fas/FaMTEBClassification.py

mteb/tasks/SummaryRetrieval/fas/FaMTEBSummaryRetrieval.py

Samoed · 2025-01-25T20:37:23Z

Maybe we should move this PR to v2 branch?

mehran-sarmadi · 2025-01-26T12:22:57Z

Maybe we should move this PR to v2 branch?

I haven’t checked the next version yet, so I’m not sure if any changes are needed. If needed, I’ll make the updates.

add data domain and subtask description

mehran-sarmadi · 2025-01-27T14:09:22Z

Great addition! Can you add mock task of AbsTaskSummaryRetrieval task to https://github.com/embeddings-benchmark/mteb/blob/main/tests/test_benchmark/mock_tasks.py?

Yes, It's done

mteb/abstasks/AbsTaskSummaryRetrieval.py

isaac-chung

Hi @mehran-sarmadi,

First of all, thank you for your contribution! The community will benefit from the extended language coverage of this benchmark.

There are 2 main points that I'd like to discuss:

While I understand the name of the new task type is summary retrieval, I believe that the tasks can actually subclass from AbsTaskBitextMining with minimal changes. AbsTaskSummaryRetrieval and SummaryRetrievalEvaluator actually has mostly the same code as the Bitext counterparts. For example, to resolve different in columns, we could define a dataset_transform(), like

def dataset_transform(self):
        self.dataset = self.dataset.rename_columns(
            {"text": "sentence1", "summary": "sentence2"}
        )

As for the evaluator, theBitextMiningEvaluator can simply be used after the above change. We can use the task metadata's description to indicate that this is a summary retrieval task. This way, we can revert all changes related to adding the new AbsTask.
2. We are in the process of releasing an updated leaderboard. As such, we will not be reviewing proposed changes to the current leaderboard. Since you've already added a Benchmark object, it will be made available automatically once the new one is released. No need for additional PRs. We appreciate the foresight and effort though.

Let me know if you have any further questions.

mteb/benchmarks/benchmarks.py

mehran-sarmadi · 2025-01-29T10:50:35Z

Hi @mehran-sarmadi,

First of all, thank you for your contribution! The community will benefit from the extended language coverage of this benchmark.

There are 2 main points that I'd like to discuss:

While I understand the name of the new task type is summary retrieval, I believe that the tasks can actually subclass from AbsTaskBitextMining with minimal changes. AbsTaskSummaryRetrieval and SummaryRetrievalEvaluator actually has mostly the same code as the Bitext counterparts. For example, to resolve different in columns, we could define a dataset_transform(), like
def dataset_transform(self):
        self.dataset = self.dataset.rename_columns(
            {"text": "sentence1", "summary": "sentence2"}
        )
As for the evaluator, theBitextMiningEvaluator can simply be used after the above change. We can use the task metadata's description to indicate that this is a summary retrieval task. This way, we can revert all changes related to adding the new AbsTask. 2. We are in the process of releasing an updated leaderboard. As such, we will not be reviewing proposed changes to the current leaderboard. Since you've already added a Benchmark object, it will be made available automatically once the new one is released. No need for additional PRs. We appreciate the foresight and effort though.

Let me know if you have any further questions.

Hi @isaac-chung,

Thank you for your detailed feedback and suggestions!

I have updated the task to subclass from AbsTaskBitextMining as suggested.
I understand the leaderboard update, thanks for the clarification!

I appreciate your guidance and the opportunity to contribute to this benchmark. Let me know if there's anything else I should consider or adjust.

Thanks again!

isaac-chung · 2025-01-29T12:29:43Z

@mehran-sarmadi thanks for such a quick turnaround! The changes look good to me. cc @KennethEnevoldsen + @x-tabdeveloping on the added task type.

I think we'll be ready once the datasets that needed to be aggregated have been specified.

mehran-sarmadi · 2025-01-29T13:47:52Z

@isaac-chung Glad to hear the changes look good.
Here are the groups that need to be specified:

"SynPerChatbotConvSAClassification": [
    "SynPerChatbotConvSAAnger",
    "SynPerChatbotConvSAFear",
    "SynPerChatbotConvSAFriendship",
    "SynPerChatbotConvSAHappiness",
    "SynPerChatbotConvSAJealousy",
    "SynPerChatbotConvSALove",
    "SynPerChatbotConvSASadness",
    "SynPerChatbotConvSASatisfaction",
    "SynPerChatbotConvSASurprise"
  ],
  "SynPerChatbotConvSAToneClassification": [
    "SynPerChatbotConvSAToneChatbotClassification",
    "SynPerChatbotConvSAToneUserClassification"
  ],
  "SynPerChatbotRAGToneClassification": [
    "SynPerChatbotRAGToneChatbotClassification",
    "SynPerChatbotRAGToneUserClassification"
  ],
  "SynPerChatbotToneClassification": [
    "SynPerChatbotToneChatbotClassification",
    "SynPerChatbotToneUserClassification"
  ],
  "CQADupstackRetrieval-Fa": [
    "CQADupstackAndroidRetrieval-Fa",
    "CQADupstackEnglishRetrieval-Fa",
    "CQADupstackGamingRetrieval-Fa",
    "CQADupstackGisRetrieval-Fa",
    "CQADupstackMathematicaRetrieval-Fa",
    "CQADupstackPhysicsRetrieval-Fa",
    "CQADupstackProgrammersRetrieval-Fa",
    "CQADupstackStatsRetrieval-Fa",
    "CQADupstackTexRetrieval-Fa",
    "CQADupstackUnixRetrieval-Fa",
    "CQADupstackWebmastersRetrieval-Fa",
    "CQADupstackWordpressRetrieval-Fa"
  ]

However, if you think any of these changes are unnecessary, we can skip them as needed.

isaac-chung · 2025-01-29T13:58:09Z

@mehran-sarmadi thanks again. This is more for me to understand your paper better: Will you be reporting the aggregated scores per group in your paper only, or will you also report the individual task scores for those within groups? In general, we aim to be as close as possible to reproducing what's been reported. So if it is the latter, then these changes are fine as is, and aggregating can be a separate PR. But if it is the former (only reporting aggregated scores), then let's add in the AggregateTasks as well.

mehran-sarmadi · 2025-01-29T14:15:56Z

@isaac-chung Thanks for your question! For SynPerChatbotConvSAClassification and CQADupstackRetrieval-Fa, since they contain a large number of datasets, we have reported the scores in an aggregated manner.

For the other cases, as they are only two, we have reported them individually. I'll go ahead and add the AggregateTask for these and will inform you once it's done.

mehran-sarmadi · 2025-01-29T16:58:48Z

Hi,
I have added the combined version of those two sets of datasets. Now, I have just one question: Should I add them here?

@pytest.mark.parametrize("task_name", ["BornholmBitextMining", "CQADupstackRetrieval"])
@pytest.mark.parametrize("eval_splits", [["test"], None])
def test_get_task(task_name: str, eval_splits: list[str] | None):
    task = get_task(task_name, eval_splits=eval_splits)

in tests/test_overview.py, like CQADupstackRetrieval, or not?

If so, it would look like this:

@pytest.mark.parametrize("task_name", ["BornholmBitextMining", "CQADupstackRetrieval", "SynPerChatbotConvSAClassification", "CQADupstackRetrieval-Fa"])
@pytest.mark.parametrize("eval_splits", [["test"], None])
def test_get_task(task_name: str, eval_splits: list[str] | None):
    task = get_task(task_name, eval_splits=eval_splits)

isaac-chung · 2025-01-30T04:04:31Z

Thanks @mehran-sarmadi , good work!

Since the test is to help us feel more confident about the implementation of AggregateTask, I don't think we need to add them there as we're simply using it.

If we want, I'd suggest adding them temporarily to run the test locally, but not commit to the PR. This is optional.

I'll run the tests now, and if they pass, I think it's good to go.

mehran-sarmadi · 2025-01-30T08:11:31Z

Thanks, @isaac-chung!

That makes sense. I really appreciate your help.

isaac-chung

Looks good. Thanks again!

KennethEnevoldsen · 2025-02-01T11:37:52Z

Just making the authors aware that I have changed the type to bitextMining:
#1915

as it causes a breaking change in the leaderboard.

* Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * 1.31.4 Automatically generated by python-semantic-release * Update tasks table * fix: Limited plotly version to be less than 6.0.0 (#1902) Limited plotly version to be less than 6.0.0 * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * update stella/jasper metainfo (#1896) update stella meta * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * 1.31.5 Automatically generated by python-semantic-release * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Feat: Add FaMTEB (Farsi/Persian Text Embedding Benchmark) (#1843) * Add Summary Retrieval Task * Add FaMTEBClassification * Add FaMTEBClustering * Add FaMTEBPairClassification * Add FaMTEBRetrieval and BEIRFA and FaMTEBSTS * Add FaMTEBSummaryRetrieval * Add FaMTEB to benchmarks * fix benchmark names * temporary fix metadata * Fix dataset revisions * Update SummaryRetrievalEvaluator.py * Update task files * Update task files * add data domain and subtask description * Update AbsTaskSummaryRetrieval and FaMTEBSummaryRetrieval * Update AbsTaskSummaryRetrieval * Add mock task * Update AbsTaskSummaryRetrieval * Update AbsTaskSummaryRetrieval * make lint * Refactor SummaryRetrieval to subclass BitextMining * Add aggregated datasets --------- Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: e.zeinivand <zeinivand@ymail.com> Co-authored-by: Erfun76 <59398902+Erfun76@users.noreply.github.com> * Update tasks table * Docs: update docs according to current state (#1870) * update docs * Apply suggestions from code review Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * update readme * Update README.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update tasks table * Update tasks table * Update tasks table * Adding a banner to the new MMTEB leaderboard (#1908) * Adding a banner to the new MMTEB leaderboard * linting * Update mteb/leaderboard/app.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * adding reference to mteb arena --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * fix: Filling missing metadata for leaderboard release (#1895) * Update ArxivClusteringS2S.py * fill some metadat for retrieval * fill in the reste of missing metadata * fix metadata * fix climatefever metadata * fix: Added CQADupstack annotations * removed annotation for non-exisitant task * format * Added financial to other financial dataset * Moved ArguAna annotation to derivate datasets --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * 1.31.6 Automatically generated by python-semantic-release * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * fix: remove SummaryRetrieval as a type (#1915) * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * fix: revert rename and add to description (#1918) * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * docs: Add sort to domains for task metadata (#1922) Tests currently go into an infinite loop. This should prevent that. * Update tasks table * 1.31.7 Automatically generated by python-semantic-release * docs: Updated citation for mteb(scandinavian) (#1914) fix: Updated citation for mteb(scandinavian) * fix: Add datasets in CodeRAG-Bench (#1595) * add three out of four datasets in CodeRAG-Bench * add verified CodeRAGStackoverflowPostsRetrieval dataset * clean up code and make some comments * fixed lint errors * addressed comments about code-rag datasets: fixed grammar and remove unnessary code and loop * roll back files which is not supposed to change * fixed the comments in split_by_first_newline() and make the methods private by adding a underscore prefix * refactor to use common args * update task descriptions * add entry in benchmarks * correct the alphanumeric order for the dataset * add in tasks.md * add in tasks.md * update task metadata * update importing path * fix lint errors * correct CodeRAG task metadata description field and id for stackoverflow-posts * fix error in test --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update tasks table * 1.31.8 Automatically generated by python-semantic-release * Leaderboard: Acks (#1930) Add acs * omit instructions.py --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Márton Kardos <power.up1163@gmail.com> Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> Co-authored-by: Mehran Sarmadi <128898167+mehran-sarmadi@users.noreply.github.com> Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: e.zeinivand <zeinivand@ymail.com> Co-authored-by: Erfun76 <59398902+Erfun76@users.noreply.github.com> Co-authored-by: Wissam Siblini <36303760+wissam-sib@users.noreply.github.com> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Pengfei He <hepengfe@gmail.com> Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * update stella/jasper metainfo (#1896) update stella meta * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * 1.31.5 Automatically generated by python-semantic-release * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Feat: Add FaMTEB (Farsi/Persian Text Embedding Benchmark) (#1843) * Add Summary Retrieval Task * Add FaMTEBClassification * Add FaMTEBClustering * Add FaMTEBPairClassification * Add FaMTEBRetrieval and BEIRFA and FaMTEBSTS * Add FaMTEBSummaryRetrieval * Add FaMTEB to benchmarks * fix benchmark names * temporary fix metadata * Fix dataset revisions * Update SummaryRetrievalEvaluator.py * Update task files * Update task files * add data domain and subtask description * Update AbsTaskSummaryRetrieval and FaMTEBSummaryRetrieval * Update AbsTaskSummaryRetrieval * Add mock task * Update AbsTaskSummaryRetrieval * Update AbsTaskSummaryRetrieval * make lint * Refactor SummaryRetrieval to subclass BitextMining * Add aggregated datasets --------- Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: e.zeinivand <zeinivand@ymail.com> Co-authored-by: Erfun76 <59398902+Erfun76@users.noreply.github.com> * Update tasks table * Docs: update docs according to current state (#1870) * update docs * Apply suggestions from code review Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * update readme * Update README.md Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update tasks table * Update tasks table * Update tasks table * Adding a banner to the new MMTEB leaderboard (#1908) * Adding a banner to the new MMTEB leaderboard * linting * Update mteb/leaderboard/app.py Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * adding reference to mteb arena --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * fix: Filling missing metadata for leaderboard release (#1895) * Update ArxivClusteringS2S.py * fill some metadat for retrieval * fill in the reste of missing metadata * fix metadata * fix climatefever metadata * fix: Added CQADupstack annotations * removed annotation for non-exisitant task * format * Added financial to other financial dataset * Moved ArguAna annotation to derivate datasets --------- Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * 1.31.6 Automatically generated by python-semantic-release * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * fix: remove SummaryRetrieval as a type (#1915) * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * fix: revert rename and add to description (#1918) * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * Update tasks table * docs: Add sort to domains for task metadata (#1922) Tests currently go into an infinite loop. This should prevent that. * Update tasks table * 1.31.7 Automatically generated by python-semantic-release * docs: Updated citation for mteb(scandinavian) (#1914) fix: Updated citation for mteb(scandinavian) * fix: Add datasets in CodeRAG-Bench (#1595) * add three out of four datasets in CodeRAG-Bench * add verified CodeRAGStackoverflowPostsRetrieval dataset * clean up code and make some comments * fixed lint errors * addressed comments about code-rag datasets: fixed grammar and remove unnessary code and loop * roll back files which is not supposed to change * fixed the comments in split_by_first_newline() and make the methods private by adding a underscore prefix * refactor to use common args * update task descriptions * add entry in benchmarks * correct the alphanumeric order for the dataset * add in tasks.md * add in tasks.md * update task metadata * update importing path * fix lint errors * correct CodeRAG task metadata description field and id for stackoverflow-posts * fix error in test --------- Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> * Update tasks table * 1.31.8 Automatically generated by python-semantic-release * update __init__ * update generate_imports script for aggregational tasks * add descriptive stats * remove print from script generate_imports * add rest of metadata * fix tests * add todo for test * Revert "fix tests" This reverts commit 7e8be03. * add back check for multilingual * fix imports --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Mehran Sarmadi <128898167+mehran-sarmadi@users.noreply.github.com> Co-authored-by: mehran <mehan.sarmadi16@gmail.com> Co-authored-by: e.zeinivand <zeinivand@ymail.com> Co-authored-by: Erfun76 <59398902+Erfun76@users.noreply.github.com> Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> Co-authored-by: Wissam Siblini <36303760+wissam-sib@users.noreply.github.com> Co-authored-by: Imene Kerboua <33312980+imenelydiaker@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> Co-authored-by: Pengfei He <hepengfe@gmail.com>

mehran added 9 commits January 24, 2025 17:58

Add Summary Retrieval Task

359a056

Add FaMTEBClassification

fdb1ce5

Add FaMTEBClustering

ae97333

Add FaMTEBPairClassification

eab993d

Add FaMTEBRetrieval and BEIRFA and FaMTEBSTS

f138440

Add FaMTEBSummaryRetrieval

1881e66

Add FaMTEB to benchmarks

fc34b77

fix benchmark names

a944ef4

temporary fix metadata

c57293f

mehran-sarmadi force-pushed the fa-mteb-v1 branch from 37b50d7 to c57293f Compare January 24, 2025 14:31

Fix dataset revisions

7624d61

This was referenced Jan 25, 2025

Add FaMTEB (Farsi/Persian Text Embedding Benchmark) embeddings-benchmark/results#92

Merged

Add FaMTEB (Farsi/Persian Text Embedding Benchmark) embeddings-benchmark/leaderboard#67

Draft

Samoed reviewed Jan 25, 2025

View reviewed changes

mteb/tasks/SummaryRetrieval/fas/FaMTEBSummaryRetrieval.py Outdated Show resolved Hide resolved

Fix conflict

5fe3730

Samoed reviewed Jan 25, 2025

View reviewed changes

mteb/evaluation/evaluators/SummaryRetrievalEvaluator.py Outdated Show resolved Hide resolved

mehran-sarmadi force-pushed the fa-mteb-v1 branch from 946ee59 to 5fe3730 Compare January 25, 2025 09:30

mehran added 3 commits January 25, 2025 14:42

Update SummaryRetrievalEvaluator.py

afba8d9

Update task files

5c382a5

Update task files

37f7a4c

mehran-sarmadi marked this pull request as ready for review January 25, 2025 14:41

Samoed requested review from x-tabdeveloping, KennethEnevoldsen and isaac-chung January 25, 2025 20:25

Samoed reviewed Jan 25, 2025

View reviewed changes

Merge branch 'main' into fa-mteb-v1

0d9dd85

ErfunZeinivand and others added 2 commits January 26, 2025 17:08

add data domain and subtask description

587d959

Merge pull request #1 from mehran-sarmadi/fa-mteb-v2

6a74745

add data domain and subtask description

mehran added 2 commits January 27, 2025 16:43

Update AbsTaskSummaryRetrieval

ffb18bc

Add mock task

86167c6

mehran-sarmadi marked this pull request as draft January 27, 2025 14:22

Update AbsTaskSummaryRetrieval

93071df

Samoed reviewed Jan 27, 2025

View reviewed changes

mteb/abstasks/AbsTaskSummaryRetrieval.py Outdated Show resolved Hide resolved

mehran added 2 commits January 27, 2025 18:14

Update AbsTaskSummaryRetrieval

728cf9a

make lint

bd94940

mehran-sarmadi marked this pull request as ready for review January 27, 2025 15:57

isaac-chung requested changes Jan 28, 2025

View reviewed changes

mteb/benchmarks/benchmarks.py Show resolved Hide resolved

mehran added 2 commits January 28, 2025 17:59

Merge branch 'main' into fa-mteb-v1

c5a611b

Refactor SummaryRetrieval to subclass BitextMining

ae4e4a1

mehran-sarmadi force-pushed the fa-mteb-v1 branch from a6ac99b to ae4e4a1 Compare January 29, 2025 10:45

mehran added 2 commits January 29, 2025 17:54

Merge branch 'main' into fa-mteb-v1-add-combined

e8e9b64

Add aggregated datasets

020b73d

isaac-chung approved these changes Jan 30, 2025

View reviewed changes

isaac-chung merged commit f3404b4 into embeddings-benchmark:main Jan 30, 2025
10 checks passed

This was referenced Feb 1, 2025

fix: remove SummaryRetrieval as a type #1915

Merged

revert rename and add to description #1918

Merged

Add FaMTEB (Farsi/Persian Text Embedding Benchmark) #1843

Add FaMTEB (Farsi/Persian Text Embedding Benchmark) #1843

Uh oh!

Conversation

mehran-sarmadi commented Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Adding datasets checklist

Uh oh!

Uh oh!

Uh oh!

Samoed left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Samoed commented Jan 25, 2025

Uh oh!

mehran-sarmadi commented Jan 26, 2025

Uh oh!

mehran-sarmadi commented Jan 27, 2025

Uh oh!

Uh oh!

isaac-chung left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mehran-sarmadi commented Jan 29, 2025

Uh oh!

isaac-chung commented Jan 29, 2025

Uh oh!

mehran-sarmadi commented Jan 29, 2025

Uh oh!

isaac-chung commented Jan 29, 2025

Uh oh!

mehran-sarmadi commented Jan 29, 2025

Uh oh!

mehran-sarmadi commented Jan 29, 2025

Uh oh!

isaac-chung commented Jan 30, 2025

Uh oh!

mehran-sarmadi commented Jan 30, 2025

Uh oh!

isaac-chung left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

KennethEnevoldsen commented Feb 1, 2025

Uh oh!

Uh oh!

mehran-sarmadi commented Jan 20, 2025 •

edited

Loading