Skip to content

openbmb/MiniCPM-Embedding still fails #1696

@Muennighoff

Description

@Muennighoff
!pip show flash-attn
Name: flash-attn
Version: 2.6.3
Summary: Flash Attention: Fast and Memory-Efficient Exact Attention
Home-page: https://github.com/Dao-AILab/flash-attention
Author: Tri Dao
Author-email: tri@tridao.me
License: 
Requires: einops, torch
Required-by: 
2025-01-02 23:49:56.428686: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-02 23:49:56.442158: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-02 23:49:56.445995: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
INFO:mteb.cli:Running with parameters: Namespace(model='openbmb/MiniCPM-Embedding', task_types=None, categories=None, tasks=['STS12'], languages=None, benchmarks=None, device=None, output_folder='/data/niklas/results/results', verbosity=2, co2_tracker=True, eval_splits=None, model_revision=None, batch_size=16, overwrite=False, save_predictions=False, func=<function run at 0x7f940fc800d0>)
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards:  50%|█████     | 1/2 [00:01<00:01,  1.04s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.64it/s]
Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.48it/s]
WARNING:mteb.models.sentence_transformer_wrapper:Model prompts are not in the expected format. Ignoring them.
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
─────────────────────────────── Selected tasks  ────────────────────────────────
STS
    - STS12, s2s


INFO:mteb.evaluation.MTEB:

********************** Evaluating STS12 **********************
No config specified, defaulting to the single config: sts12-sts/default
INFO:datasets.builder:No config specified, defaulting to the single config: sts12-sts/default
Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
INFO:datasets.info:Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
Overwrite dataset info from restored data version if exists.
INFO:datasets.builder:Overwrite dataset info from restored data version if exists.
Loading Dataset info from /data/huggingface/datasets/mteb___sts12-sts/default/0.0.0/a0d554a64d88156834ff5ae9920b964011b16384
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___sts12-sts/default/0.0.0/a0d554a64d88156834ff5ae9920b964011b16384
Found cached dataset sts12-sts (/data/huggingface/datasets/mteb___sts12-sts/default/0.0.0/a0d554a64d88156834ff5ae9920b964011b16384)
INFO:datasets.builder:Found cached dataset sts12-sts (/data/huggingface/datasets/mteb___sts12-sts/default/0.0.0/a0d554a64d88156834ff5ae9920b964011b16384)
Loading Dataset info from /data/huggingface/datasets/mteb___sts12-sts/default/0.0.0/a0d554a64d88156834ff5ae9920b964011b16384
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___sts12-sts/default/0.0.0/a0d554a64d88156834ff5ae9920b964011b16384
INFO:mteb.abstasks.AbsTask:
Task: STS12, split: test, subset: default. Running...
INFO:mteb.models.sentence_transformer_wrapper:No model prompts found for task=STS12 prompt_type=None
INFO:mteb.models.sentence_transformer_wrapper:Encoding 3108 sentences.
WARNING:transformers_modules.openbmb.MiniCPM-Embedding.c0cb2de33fb366e17c30f9d53142ff11bc18e049.modeling_minicpm:The input hidden states seems to be silently casted in float32, this might be related to the fact you have upcasted embedding or layer norm layers in float32. We will cast back the input in torch.float32.
ERROR:mteb.evaluation.MTEB:Error while evaluating STS12: FlashAttention only support fp16 and bf16 data type
Traceback (most recent call last):
  File "/env/lib/conda/gritkto/bin/mteb", line 8, in <module>
    sys.exit(main())
  File "/data/niklas/mteb/mteb/cli.py", line 387, in main
    args.func(args)
  File "/data/niklas/mteb/mteb/cli.py", line 145, in run
    eval.run(
  File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 630, in run
    raise e
  File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 569, in run
    results, tick, tock = self._run_eval(
  File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 304, in _run_eval
    results = task.evaluate(
  File "/data/niklas/mteb/mteb/abstasks/AbsTask.py", line 126, in evaluate
    scores[hf_subset] = self._evaluate_subset(
  File "/data/niklas/mteb/mteb/abstasks/AbsTaskSTS.py", line 88, in _evaluate_subset
    scores = evaluator(model, encode_kwargs=encode_kwargs)
  File "/data/niklas/mteb/mteb/evaluation/evaluators/STSEvaluator.py", line 47, in __call__
    embeddings1 = model.encode(
  File "/data/niklas/mteb/mteb/models/sentence_transformer_wrapper.py", line 108, in encode
    embeddings = self.model.encode(
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 623, in encode
    out_features = self.forward(features, **kwargs)
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 690, in forward
    input = module(input, **module_kwargs)
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/sentence_transformers/models/Transformer.py", line 393, in forward
    output_states = self.auto_model(**trans_features, **kwargs, return_dict=False)
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/huggingface/modules/transformers_modules/openbmb/MiniCPM-Embedding/c0cb2de33fb366e17c30f9d53142ff11bc18e049/modeling_minicpm.py", line 1089, in forward
    layer_outputs = decoder_layer(
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/huggingface/modules/transformers_modules/openbmb/MiniCPM-Embedding/c0cb2de33fb366e17c30f9d53142ff11bc18e049/modeling_minicpm.py", line 812, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/huggingface/modules/transformers_modules/openbmb/MiniCPM-Embedding/c0cb2de33fb366e17c30f9d53142ff11bc18e049/modeling_minicpm.py", line 565, in forward
    attn_output = self._flash_attention_forward(
  File "/data/huggingface/modules/transformers_modules/openbmb/MiniCPM-Embedding/c0cb2de33fb366e17c30f9d53142ff11bc18e049/modeling_minicpm.py", line 613, in _flash_attention_forward
    attn_output_unpad = flash_attn_varlen_func(
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 1124, in flash_attn_varlen_func
    return FlashAttnVarlenFunc.apply(
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/autograd/function.py", line 574, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 620, in forward
    out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = _flash_attn_varlen_forward(
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 90, in _flash_attn_varlen_forward
    out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.varlen_fwd(
RuntimeError: FlashAttention only support fp16 and bf16 data type


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions