Skip to content

bm25s KeyError #1646

@Muennighoff

Description

@Muennighoff

mteb run -m bm25s -t indonli --batch_size 64

leads to

2024-12-31 07:14:01.836144: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-12-31 07:14:01.850054: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-12-31 07:14:01.853941: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
INFO:mteb.cli:Running with parameters: Namespace(model='bm25s', task_types=None, categories=None, tasks=['indonli'], languages=None, benchmarks=None, device=None, output_folder='/data/niklas/results/results', verbosity=2, co2_tracker=True, eval_splits=None, model_revision=None, batch_size=64, overwrite=False, save_predictions=False, func=<function run at 0x7f4fd3875630>)
WARNING:jax._src.xla_bridge:An NVIDIA GPU may be present on this machine, but a CUDA-enabled jaxlib is not installed. Falling back to cpu.
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
─────────────────────────────── Selected tasks  ────────────────────────────────
PairClassification
    - indonli, s2s


INFO:mteb.evaluation.MTEB:

********************** Evaluating indonli **********************
No config specified, defaulting to the single config: indonli/indonli
INFO:datasets.builder:No config specified, defaulting to the single config: indonli/indonli
Loading Dataset Infos from /data/huggingface/modules/datasets_modules/datasets/afaji--indonli/d34041bd1d1a555a4bcb4ffdb9fe904778da6f7c5343209fc1485dd68121cb62
INFO:datasets.info:Loading Dataset Infos from /data/huggingface/modules/datasets_modules/datasets/afaji--indonli/d34041bd1d1a555a4bcb4ffdb9fe904778da6f7c5343209fc1485dd68121cb62
Overwrite dataset info from restored data version if exists.
INFO:datasets.builder:Overwrite dataset info from restored data version if exists.
Loading Dataset info from /data/huggingface/datasets/afaji___indonli/indonli/1.1.0/d34041bd1d1a555a4bcb4ffdb9fe904778da6f7c5343209fc1485dd68121cb62
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/afaji___indonli/indonli/1.1.0/d34041bd1d1a555a4bcb4ffdb9fe904778da6f7c5343209fc1485dd68121cb62
Found cached dataset indonli (/data/huggingface/datasets/afaji___indonli/indonli/1.1.0/d34041bd1d1a555a4bcb4ffdb9fe904778da6f7c5343209fc1485dd68121cb62)
INFO:datasets.builder:Found cached dataset indonli (/data/huggingface/datasets/afaji___indonli/indonli/1.1.0/d34041bd1d1a555a4bcb4ffdb9fe904778da6f7c5343209fc1485dd68121cb62)
Loading Dataset info from /data/huggingface/datasets/afaji___indonli/indonli/1.1.0/d34041bd1d1a555a4bcb4ffdb9fe904778da6f7c5343209fc1485dd68121cb62
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/afaji___indonli/indonli/1.1.0/d34041bd1d1a555a4bcb4ffdb9fe904778da6f7c5343209fc1485dd68121cb62
Loading cached processed dataset at /data/huggingface/datasets/afaji___indonli/indonli/1.1.0/d34041bd1d1a555a4bcb4ffdb9fe904778da6f7c5343209fc1485dd68121cb62/cache-88b74c33c8265bab.arrow
INFO:datasets.arrow_dataset:Loading cached processed dataset at /data/huggingface/datasets/afaji___indonli/indonli/1.1.0/d34041bd1d1a555a4bcb4ffdb9fe904778da6f7c5343209fc1485dd68121cb62/cache-88b74c33c8265bab.arrow
Loading cached processed dataset at /data/huggingface/datasets/afaji___indonli/indonli/1.1.0/d34041bd1d1a555a4bcb4ffdb9fe904778da6f7c5343209fc1485dd68121cb62/cache-29cebb2c0ab6a8d9.arrow
INFO:datasets.arrow_dataset:Loading cached processed dataset at /data/huggingface/datasets/afaji___indonli/indonli/1.1.0/d34041bd1d1a555a4bcb4ffdb9fe904778da6f7c5343209fc1485dd68121cb62/cache-29cebb2c0ab6a8d9.arrow
INFO:mteb.abstasks.AbsTask:
Task: indonli, split: test_expert, subset: default. Running...
WARNING:mteb.evaluation.evaluators.PairClassificationEvaluator:Found 1531/4080 duplicates in the input data. Only encoding unique sentences.

Split strings:   0%|          | 0/2549 [00:00<?, ?it/s]
                                                       

Stem Tokens:   0%|          | 0/2549 [00:00<?, ?it/s]
                                                     
ERROR:mteb.evaluation.MTEB:Error while evaluating indonli: 'Film Bucin bercerita tentang 4 sahabat (Andovi, Tommy, Jovi, dan Chandra) yang berusaha keluar dari hubungan yang tidak sehat karena mereka BUCIN (Budak Cinta). Mereka memutuskan untuk mengikuti kelas ANTI BUCIN agar mereka bisa menjalankan hubungan yang lebih dewasa, dan tidak diperbudak oleh cinta.'
Traceback (most recent call last):
  File "/env/lib/conda/gritkto/bin/mteb", line 8, in <module>
    sys.exit(main())
  File "/data/niklas/mteb/mteb/cli.py", line 387, in main
    args.func(args)
  File "/data/niklas/mteb/mteb/cli.py", line 145, in run
    eval.run(
  File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 623, in run
    raise e
  File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 562, in run
    results, tick, tock = self._run_eval(
  File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 304, in _run_eval
    results = task.evaluate(
  File "/data/niklas/mteb/mteb/abstasks/AbsTask.py", line 126, in evaluate
    scores[hf_subset] = self._evaluate_subset(
  File "/data/niklas/mteb/mteb/abstasks/AbsTaskPairClassification.py", line 93, in _evaluate_subset
    scores = evaluator.compute_metrics(model, encode_kwargs=encode_kwargs)
  File "/data/niklas/mteb/mteb/evaluation/evaluators/PairClassificationEvaluator.py", line 98, in compute_metrics
    embeddings1 = [emb_dict[sent] for sent in self.sentences1]
  File "/data/niklas/mteb/mteb/evaluation/evaluators/PairClassificationEvaluator.py", line 98, in <listcomp>
    embeddings1 = [emb_dict[sent] for sent in self.sentences1]
KeyError: 'Film Bucin bercerita tentang 4 sahabat (Andovi, Tommy, Jovi, dan Chandra) yang berusaha keluar dari hubungan yang tidak sehat karena mereka BUCIN (Budak Cinta). Mereka memutuskan untuk mengikuti kelas ANTI BUCIN agar mereka bisa menjalankan hubungan yang lebih dewasa, dan tidak diperbudak oleh cinta.'

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions