model: add kalm_models (kalm-emb-v2) ModelMeta (new PR) #2889

ItsukiFujii · 2025-07-09T11:35:46Z

If you add a model or a dataset, please add the corresponding checklist:

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
- mteb.get_model(model_name, revision) and
- mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.
The model is public, i.e. is available either as an API or the wieght are publicly avaiable to download

mteb/models/kalm_models.py

Samoed · 2025-07-10T09:07:42Z

mteb/models/kalm_models.py

+def kalmv2_instruct_loader(model_name_or_path, **kwargs):
+    model = InstructSentenceTransformerWrapper(
+        model_name_or_path,
+        **kwargs,
+    )
+    return model


Can you use InstructSentenceTransformerWrapper direcly in ModelMeta?

@ItsukiFujii Can you remove this function?

Hi @Samoed
Function kalmv2_instruct_loader has been removed

mteb/models/kalm_models.py

ItsukiFujii · 2025-07-11T00:51:49Z

Hi @Samoed
I'm confused about why this error occurred in the code. If you have any ideas, please let me know

mteb/evaluation/MTEB.py:672: in run
    raise e
mteb/evaluation/MTEB.py:625: in run
    results, tick, tock = self._run_eval(
mteb/evaluation/MTEB.py:307: in _run_eval
    results = task.evaluate(
mteb/abstasks/AbsTaskMultilabelClassification.py:159: in evaluate
    scores[hf_subset] = self._evaluate_subset(
mteb/abstasks/AbsTaskMultilabelClassification.py:194: in _evaluate_subset
    sample_indices, _ = self._undersample_data_indices(
mteb/abstasks/AbsTaskMultilabelClassification.py:256: in _undersample_data_indices
    if any((label_counter[label] < samples_per_label) for label in y[i]):
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/datasets/arrow_dataset.py:669: in __getitem__
    return self.source._fast_select_column(self.column_name)[key][self.column_name]
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/datasets/arrow_dataset.py:2859: in __getitem__
    return self._getitem(key)
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/datasets/arrow_dataset.py:2841: in _getitem
    formatted_output = format_table(
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/datasets/formatting/formatting.py:654: in format_table
    query_type = key_to_query_type(key)
/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/datasets/formatting/formatting.py:574: in key_to_query_type
    _raise_bad_key_type(key)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

key = np.int64(0)

    def _raise_bad_key_type(key: Any):
>       raise TypeError(
            f"Wrong key type: '{key}' of type '{type(key)}'. Expected one of int, slice, range, str or Iterable."
        )
E       TypeError: Wrong key type: '0' of type '<class 'numpy.int64'>'. Expected one of int, slice, range, str or Iterable.

/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/datasets/formatting/formatting.py:44: TypeError

Samoed · 2025-07-11T07:16:12Z

Yes, this is problem with datasets. I've fixed it. Can you update your branch with latest main?

ItsukiFujii · 2025-07-11T08:12:46Z

Yes, this is problem with datasets. I've fixed it. Can you update your branch with latest main?

Okay, thank you very much

ItsukiFujii · 2025-07-15T07:38:34Z

Hi @Samoed
All checks have passed. It would be nice if you could please review this PR :)

xinshuohu and others added 6 commits June 25, 2025 17:56

feat: add KaLM_Embedding_X_0605 in kalm_models

246c460

Merge branch 'main' into main

e704b15

Update kalm_models.py for lint format

175bf1d

kalm-emb-v2

9a1890a

kalm-emb-v2

c2972fa

Merge branch 'main' into main

2f03121

ItsukiFujii mentioned this pull request Jul 9, 2025

kalm-emb-v2 results embeddings-benchmark/results#235

Merged

6 tasks

Samoed reviewed Jul 9, 2025

View reviewed changes

mteb/models/kalm_models.py Outdated Show resolved Hide resolved

mteb/models/kalm_models.py Outdated Show resolved Hide resolved

mteb/models/kalm_models.py Outdated Show resolved Hide resolved

ItsukiFujii added 2 commits July 10, 2025 03:09

kalm-emb-v2

dd85398

kalm-emb-v2

4ef4f48

Samoed reviewed Jul 10, 2025

View reviewed changes

kalm-emb-v2

f5e99b0

ItsukiFujii force-pushed the main branch from 5ea7b6b to f5e99b0 Compare July 10, 2025 10:49

Merge remote-tracking branch 'upstream/main'

fec357a

kalm-emb-v2

a59c0de

Samoed changed the title ~~model: add kalm_models (kalm-emb-v2, kalm-x) ModelMeta (new PR)~~ model: add kalm_models (kalm-emb-v2) ModelMeta (new PR) Jul 15, 2025

Samoed merged commit 9ecac21 into embeddings-benchmark:main Jul 15, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

model: add kalm_models (kalm-emb-v2) ModelMeta (new PR) #2889

model: add kalm_models (kalm-emb-v2) ModelMeta (new PR) #2889

Uh oh!

ItsukiFujii commented Jul 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Samoed Jul 10, 2025

Uh oh!

Samoed Jul 15, 2025

Uh oh!

ItsukiFujii Jul 15, 2025

Uh oh!

Uh oh!

ItsukiFujii commented Jul 11, 2025

Uh oh!

Samoed commented Jul 11, 2025

Uh oh!

ItsukiFujii commented Jul 11, 2025

Uh oh!

ItsukiFujii commented Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

model: add kalm_models (kalm-emb-v2) ModelMeta (new PR) #2889

model: add kalm_models (kalm-emb-v2) ModelMeta (new PR) #2889

Uh oh!

Conversation

ItsukiFujii commented Jul 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Samoed Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

ItsukiFujii Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ItsukiFujii commented Jul 11, 2025

Uh oh!

Samoed commented Jul 11, 2025

Uh oh!

ItsukiFujii commented Jul 11, 2025

Uh oh!

ItsukiFujii commented Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!