-
Notifications
You must be signed in to change notification settings - Fork 464
Open
Labels
v2Issues and PRs related to `v2` branchIssues and PRs related to `v2` branch
Milestone
Description
2025-01-02 22:39:00.672963: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-02 22:39:00.686375: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-02 22:39:00.690128: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
INFO:mteb.cli:Running with parameters: Namespace(model='silma-ai/silma-embeddding-matryoshka-v0.1', task_types=None, categories=None, tasks=['BUCC'], languages=None, benchmarks=None, device=None, output_folder='/data/niklas/results/results', verbosity=2, co2_tracker=True, eval_splits=None, model_revision=None, batch_size=64, overwrite=False, save_predictions=False, func=<function run at 0x7fca802180d0>)
WARNING:mteb.model_meta:Loader not specified for model silma-ai/silma-embeddding-matryoshka-v0.1, loading using sentence transformers.
INFO:mteb.evaluation.MTEB:
## Evaluating 1 tasks:
─────────────────────────────── Selected tasks ────────────────────────────────
BitextMining
- BUCC, s2s, multilingual 4 / 4 Subsets
INFO:mteb.evaluation.MTEB:
********************** Evaluating BUCC **********************
WARNING:mteb.abstasks.AbsTask:Dataset 'BUCC' is superseded by 'BUCC.v2', you might consider using the newer version of the dataset.
Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
INFO:datasets.info:Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
Overwrite dataset info from restored data version if exists.
INFO:datasets.builder:Overwrite dataset info from restored data version if exists.
Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/de-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/de-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
Found cached dataset bucc-bitext-mining (/data/huggingface/datasets/mteb___bucc-bitext-mining/de-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677)
INFO:datasets.builder:Found cached dataset bucc-bitext-mining (/data/huggingface/datasets/mteb___bucc-bitext-mining/de-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677)
Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/de-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/de-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
INFO:datasets.info:Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
Overwrite dataset info from restored data version if exists.
INFO:datasets.builder:Overwrite dataset info from restored data version if exists.
Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/fr-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/fr-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
Found cached dataset bucc-bitext-mining (/data/huggingface/datasets/mteb___bucc-bitext-mining/fr-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677)
INFO:datasets.builder:Found cached dataset bucc-bitext-mining (/data/huggingface/datasets/mteb___bucc-bitext-mining/fr-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677)
Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/fr-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/fr-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
INFO:datasets.info:Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
Overwrite dataset info from restored data version if exists.
INFO:datasets.builder:Overwrite dataset info from restored data version if exists.
Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/ru-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/ru-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
Found cached dataset bucc-bitext-mining (/data/huggingface/datasets/mteb___bucc-bitext-mining/ru-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677)
INFO:datasets.builder:Found cached dataset bucc-bitext-mining (/data/huggingface/datasets/mteb___bucc-bitext-mining/ru-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677)
Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/ru-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/ru-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
INFO:datasets.info:Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
Overwrite dataset info from restored data version if exists.
INFO:datasets.builder:Overwrite dataset info from restored data version if exists.
Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/zh-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/zh-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
Found cached dataset bucc-bitext-mining (/data/huggingface/datasets/mteb___bucc-bitext-mining/zh-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677)
INFO:datasets.builder:Found cached dataset bucc-bitext-mining (/data/huggingface/datasets/mteb___bucc-bitext-mining/zh-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677)
Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/zh-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/zh-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
ERROR:mteb.evaluation.MTEB:Error while evaluating BUCC: "Column gold not in the dataset. Current columns in the dataset: ['sentence1', 'sentence2', 'lang']"
Traceback (most recent call last):
File "/env/lib/conda/gritkto/bin/mteb", line 8, in <module>
sys.exit(main())
File "/data/niklas/mteb/mteb/cli.py", line 387, in main
args.func(args)
File "/data/niklas/mteb/mteb/cli.py", line 145, in run
eval.run(
File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 630, in run
raise e
File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 534, in run
task.load_data(**kwargs)
File "/data/niklas/mteb/mteb/abstasks/MultiSubsetLoader.py", line 17, in load_data
self.dataset_transform()
File "/data/niklas/mteb/mteb/tasks/BitextMining/multilingual/BUCCBitextMining.py", line 67, in dataset_transform
gold = data["gold"][0]
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2872, in __getitem__
return self._getitem(key)
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2856, in _getitem
pa_subtable = query_table(self._data, key, indices=self._indices)
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 590, in query_table
_check_valid_column_key(key, table.column_names)
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 527, in _check_valid_column_key
raise KeyError(f"Column {key} not in the dataset. Current columns in the dataset: {columns}")
KeyError: "Column gold not in the dataset. Current columns in the dataset: ['sentence1', 'sentence2', 'lang']"
Metadata
Metadata
Assignees
Labels
v2Issues and PRs related to `v2` branchIssues and PRs related to `v2` branch