Skip to content

openai/text-embedding-3-large does not allow for emptry string #1650

@Muennighoff

Description

@Muennighoff

2025-01-01 02:59:59.297956: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-01 02:59:59.311878: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-01 02:59:59.315723: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
INFO:mteb.cli:Running with parameters: Namespace(model='openai/text-embedding-3-large', task_types=None, categories=None, tasks=['STS22'], languages=None, benchmarks=None, device=None, output_folder='/data/niklas/results/results', verbosity=2, co2_tracker=True, eval_splits=None, model_revision=None, batch_size=64, overwrite=False, save_predictions=False, func=<function run at 0x7ff797b0d630>)
INFO:mteb.evaluation.MTEB:

Evaluating 1 tasks:

─────────────────────────────── Selected tasks ────────────────────────────────
STS
- STS22, p2p, multilingual 18 / 18 Subsets

INFO:mteb.evaluation.MTEB:

********************** Evaluating STS22 **********************
WARNING:mteb.abstasks.AbsTask:Dataset 'STS22' is superseded by 'STS22.v2', you might consider using the newer version of the dataset.
Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
INFO:datasets.info:Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
Overwrite dataset info from restored data version if exists.
INFO:datasets.builder:Overwrite dataset info from restored data version if exists.
Loading Dataset info from /data/huggingface/datasets/mteb___sts22-crosslingual-sts/default/0.0.0/de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___sts22-crosslingual-sts/default/0.0.0/de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3
Found cached dataset sts22-crosslingual-sts (/data/huggingface/datasets/mteb___sts22-crosslingual-sts/default/0.0.0/de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3)
INFO:datasets.builder:Found cached dataset sts22-crosslingual-sts (/data/huggingface/datasets/mteb___sts22-crosslingual-sts/default/0.0.0/de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3)
Loading Dataset info from /data/huggingface/datasets/mteb___sts22-crosslingual-sts/default/0.0.0/de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___sts22-crosslingual-sts/default/0.0.0/de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3
INFO:mteb.abstasks.AbsTask:
Task: STS22, split: test, subset: zh-en. Running...
--- Logging error ---
Traceback (most recent call last):
File "/data/niklas/mteb/mteb/models/openai_models.py", line 94, in encode
response = self._client.embeddings.create(
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/openai/resources/embeddings.py", line 114, in create
return self._post(
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/openai/_base_client.py", line 1260, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/openai/_base_client.py", line 937, in request
return self._request(
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/openai/_base_client.py", line 1041, in _request
raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': "'$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference.", 'type': 'invalid_request_error', 'param': None, 'code': None}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/env/lib/conda/gritkto/lib/python3.10/logging/init.py", line 1100, in emit
msg = self.format(record)
File "/env/lib/conda/gritkto/lib/python3.10/logging/init.py", line 943, in format
return fmt.format(record)
File "/env/lib/conda/gritkto/lib/python3.10/logging/init.py", line 678, in format
record.message = record.getMessage()
File "/env/lib/conda/gritkto/lib/python3.10/logging/init.py", line 368, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
File "/env/lib/conda/gritkto/bin/mteb", line 8, in
sys.exit(main())
File "/data/niklas/mteb/mteb/cli.py", line 387, in main
args.func(args)
File "/data/niklas/mteb/mteb/cli.py", line 145, in run
eval.run(
File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 576, in run
results, tick, tock = self._run_eval(
File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 304, in _run_eval
results = task.evaluate(
File "/data/niklas/mteb/mteb/abstasks/AbsTask.py", line 126, in evaluate
scores[hf_subset] = self._evaluate_subset(
File "/data/niklas/mteb/mteb/abstasks/AbsTaskSTS.py", line 88, in _evaluate_subset
scores = evaluator(model, encode_kwargs=encode_kwargs)
File "/data/niklas/mteb/mteb/evaluation/evaluators/STSEvaluator.py", line 47, in call
embeddings1 = model.encode(
File "/data/niklas/mteb/mteb/models/openai_models.py", line 102, in encode
logger.info("Sleeping for 10 seconds due to error", e)
Message: 'Sleeping for 10 seconds due to error'
Arguments: (BadRequestError('Error code: 400 - {'error': {'message': "'$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference.", 'type': 'invalid_request_error', 'param': None, 'code': None}}'),)
--- Logging error ---
Traceback (most recent call last):
File "/data/niklas/mteb/mteb/models/openai_models.py", line 107, in encode
response = self._client.embeddings.create(
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/openai/resources/embeddings.py", line 114, in create
return self._post(
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/openai/_base_client.py", line 1260, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/openai/_base_client.py", line 937, in request
return self._request(
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/openai/_base_client.py", line 1041, in _request
raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': "'$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference.", 'type': 'invalid_request_error', 'param': None, 'code': None}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/env/lib/conda/gritkto/lib/python3.10/logging/init.py", line 1100, in emit
msg = self.format(record)
File "/env/lib/conda/gritkto/lib/python3.10/logging/init.py", line 943, in format
return fmt.format(record)
File "/env/lib/conda/gritkto/lib/python3.10/logging/init.py", line 678, in format
record.message = record.getMessage()
File "/env/lib/conda/gritkto/lib/python3.10/logging/init.py", line 368, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
File "/env/lib/conda/gritkto/bin/mteb", line 8, in
sys.exit(main())
File "/data/niklas/mteb/mteb/cli.py", line 387, in main
args.func(args)
File "/data/niklas/mteb/mteb/cli.py", line 145, in run
eval.run(
File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 576, in run
results, tick, tock = self._run_eval(
File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 304, in _run_eval
results = task.evaluate(
File "/data/niklas/mteb/mteb/abstasks/AbsTask.py", line 126, in evaluate
scores[hf_subset] = self._evaluate_subset(
File "/data/niklas/mteb/mteb/abstasks/AbsTaskSTS.py", line 88, in _evaluate_subset
scores = evaluator(model, encode_kwargs=encode_kwargs)
File "/data/niklas/mteb/mteb/evaluation/evaluators/STSEvaluator.py", line 47, in call
embeddings1 = model.encode(
File "/data/niklas/mteb/mteb/models/openai_models.py", line 114, in encode
logger.info("Sleeping for 60 seconds due to error", e)
Message: 'Sleeping for 60 seconds due to error'
Arguments: (BadRequestError('Error code: 400 - {'error': {'message': "'$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference.", 'type': 'invalid_request_error', 'param': None, 'code': None}}'),)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions