Skip to content

GermanDPR fails #1700

@Muennighoff

Description

@Muennighoff
2025-01-02 04:19:26.282703: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-02 04:19:26.296198: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-02 04:19:26.299958: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
INFO:mteb.cli:Running with parameters: Namespace(model='minishlab/M2V_base_glove', task_types=None, categories=None, tasks=['GermanDPR'], languages=None, benchmarks=None, device=None, output_folder='/data/niklas/results/results', verbosity=2, co2_tracker=True, eval_splits=None, model_revision=None, batch_size=64, overwrite=False, save_predictions=False, func=<function run at 0x7f26937bbd00>)
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
─────────────────────────────── Selected tasks  ────────────────────────────────
Retrieval
    - GermanDPR, s2p


INFO:mteb.evaluation.MTEB:

********************** Evaluating GermanDPR **********************
No config specified, defaulting to the single config: germandpr/plain_text
INFO:datasets.builder:No config specified, defaulting to the single config: germandpr/plain_text
Loading Dataset Infos from /data/huggingface/modules/datasets_modules/datasets/deepset--germandpr/35b77aa4815a72575b852f9aef779ed1d8dd1ea6d92a670545468c409ae88f06
INFO:datasets.info:Loading Dataset Infos from /data/huggingface/modules/datasets_modules/datasets/deepset--germandpr/35b77aa4815a72575b852f9aef779ed1d8dd1ea6d92a670545468c409ae88f06
Overwrite dataset info from restored data version if exists.
INFO:datasets.builder:Overwrite dataset info from restored data version if exists.
Loading Dataset info from /data/huggingface/datasets/deepset___germandpr/plain_text/1.0.0/35b77aa4815a72575b852f9aef779ed1d8dd1ea6d92a670545468c409ae88f06
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/deepset___germandpr/plain_text/1.0.0/35b77aa4815a72575b852f9aef779ed1d8dd1ea6d92a670545468c409ae88f06
Found cached dataset germandpr (/data/huggingface/datasets/deepset___germandpr/plain_text/1.0.0/35b77aa4815a72575b852f9aef779ed1d8dd1ea6d92a670545468c409ae88f06)
INFO:datasets.builder:Found cached dataset germandpr (/data/huggingface/datasets/deepset___germandpr/plain_text/1.0.0/35b77aa4815a72575b852f9aef779ed1d8dd1ea6d92a670545468c409ae88f06)
Loading Dataset info from /data/huggingface/datasets/deepset___germandpr/plain_text/1.0.0/35b77aa4815a72575b852f9aef779ed1d8dd1ea6d92a670545468c409ae88f06
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/deepset___germandpr/plain_text/1.0.0/35b77aa4815a72575b852f9aef779ed1d8dd1ea6d92a670545468c409ae88f06
ERROR:mteb.evaluation.MTEB:Error while evaluating GermanDPR: string indices must be integers
Traceback (most recent call last):
  File "/env/lib/conda/gritkto/bin/mteb", line 8, in <module>
    sys.exit(main())
  File "/data/niklas/mteb/mteb/cli.py", line 387, in main
    args.func(args)
  File "/data/niklas/mteb/mteb/cli.py", line 145, in run
    eval.run(
  File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 623, in run
    raise e
  File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 527, in run
    task.load_data(**kwargs)
  File "/data/niklas/mteb/mteb/tasks/Retrieval/deu/GermanDPRRetrieval.py", line 86, in load_data
    corpus = {doc["id"]: doc.get("title", "") + " " + doc["text"] for doc in corpus}
  File "/data/niklas/mteb/mteb/tasks/Retrieval/deu/GermanDPRRetrieval.py", line 86, in <dictcomp>
    corpus = {doc["id"]: doc.get("title", "") + " " + doc["text"] for doc in corpus}
TypeError: string indices must be integers

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions