[BUG] Eval recipe not using max_seq_length

```bash
2024-09-21:20:19:56,843 INFO     [_logging.py:101] Running EleutherEvalRecipe with resolved config:

batch_size: 1
checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: ./target/1b_normal
  checkpoint_files:
  - pytorch_model.bin
  model_type: LLAMA2
  output_dir: ./target/tmp
device: cuda
dtype: fp32
enable_kv_cache: true
limit: null
max_seq_length: 1024
model:
  _component_: torchtune.models.llama2.llama2
  embed_dim: 2048
  max_seq_len: 2048
  norm_eps: 1.0e-05
  num_heads: 32
  num_kv_heads: 4
  num_layers: 22
  vocab_size: 32000
quantizer: null
seed: 1234
tasks:
- mmlu_pro
tokenizer:
  _component_: torchtune.models.llama2.llama2_tokenizer
  path: ./target/1b_normal/tokenizer.model

2024-09-21:20:19:57,698 DEBUG    [seed.py:60] Setting manual seed to local seed 1234. Local seed is seed + rank = 1234 + 0
2024-09-21:20:19:58,522 INFO     [eleuther_eval.py:237] Model is initialized with precision torch.float32.
2024-09-21:20:19:58,532 INFO     [eleuther_eval.py:209] Tokenizer is initialized from file.
2024-09-21:20:19:58,659 INFO     [huggingface.py:130] Using device 'cuda:0'
/home/salman/.pyenv/versions/3.11.9/envs/tune/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
2024-09-21:20:19:58,987 INFO     [huggingface.py:366] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda:0'}
2024-09-21:20:20:00,200 INFO     [__init__.py:491] `group` and `group_alias` keys in TaskConfigs are deprecated and will be removed in v0.4.5 of lm_eval. The new `tag` field will be used to allow for a shortcut to a group of tasks one does not wish to aggregate metrics across. `group`s which aggregate across subtasks must be only defined in a separate group config file, which will be the official way to create groups that support cross-task aggregation as in `mmlu`. Please see the v0.4.4 patch notes and our documentation: https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/new_task_guide.md#advanced-group-configs for more information.
2024-09-21:20:20:33,959 INFO     [eleuther_eval.py:280] Running evaluation on ['mmlu_pro'] tasks.
2024-09-21:20:20:33,961 INFO     [task.py:423] Building contexts for mmlu_pro_biology on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 717/717 [00:00<00:00, 861.90it/s]
2024-09-21:20:20:34,879 INFO     [task.py:423] Building contexts for mmlu_pro_business on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 789/789 [00:00<00:00, 860.26it/s]
2024-09-21:20:20:35,883 INFO     [task.py:423] Building contexts for mmlu_pro_chemistry on rank 0...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1132/1132 [00:01<00:00, 816.52it/s]
2024-09-21:20:20:37,392 INFO     [task.py:423] Building contexts for mmlu_pro_computer_science on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 410/410 [00:00<00:00, 843.05it/s]
2024-09-21:20:20:37,929 INFO     [task.py:423] Building contexts for mmlu_pro_economics on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 844/844 [00:01<00:00, 842.82it/s]
2024-09-21:20:20:39,024 INFO     [task.py:423] Building contexts for mmlu_pro_engineering on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 969/969 [00:01<00:00, 871.45it/s]
2024-09-21:20:20:40,249 INFO     [task.py:423] Building contexts for mmlu_pro_health on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 818/818 [00:00<00:00, 862.93it/s]
2024-09-21:20:20:41,289 INFO     [task.py:423] Building contexts for mmlu_pro_history on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 381/381 [00:00<00:00, 824.88it/s]
2024-09-21:20:20:41,798 INFO     [task.py:423] Building contexts for mmlu_pro_law on rank 0...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1101/1101 [00:01<00:00, 855.97it/s]
2024-09-21:20:20:43,211 INFO     [task.py:423] Building contexts for mmlu_pro_math on rank 0...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1351/1351 [00:01<00:00, 831.30it/s]
2024-09-21:20:20:44,988 INFO     [task.py:423] Building contexts for mmlu_pro_other on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 924/924 [00:01<00:00, 818.69it/s]
2024-09-21:20:20:46,235 INFO     [task.py:423] Building contexts for mmlu_pro_philosophy on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 499/499 [00:00<00:00, 824.51it/s]
2024-09-21:20:20:46,900 INFO     [task.py:423] Building contexts for mmlu_pro_physics on rank 0...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1299/1299 [00:01<00:00, 843.07it/s]
2024-09-21:20:20:48,589 INFO     [task.py:423] Building contexts for mmlu_pro_psychology on rank 0...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 798/798 [00:00<00:00, 827.32it/s]
2024-09-21:20:20:49,653 INFO     [evaluator.py:465] Running generate_until requests
Running generate_until requests:   0%|                                                                                                                                                                                                                            | 0/12032 [00:00<?, ?it/s]torch.Size([1, 3121])
Traceback (most recent call last):
  File "/home/salman/.pyenv/versions/3.11.9/envs/tune/bin/tune", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/salman/torchtune/torchtune/_cli/tune.py", line 49, in main
    parser.run(args)
  File "/home/salman/torchtune/torchtune/_cli/tune.py", line 43, in run
    args.func(args)
  File "/home/salman/torchtune/torchtune/_cli/run.py", line 185, in _run_cmd
    self._run_single_device(args)
  File "/home/salman/torchtune/torchtune/_cli/run.py", line 94, in _run_single_device
    runpy.run_path(str(args.recipe), run_name="__main__")
  File "<frozen runpy>", line 291, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "/home/salman/torchtune/recipes/eleuther_eval.py", line 303, in <module>
    sys.exit(recipe_main())
             ^^^^^^^^^^^^^
  File "/home/salman/torchtune/torchtune/config/_parse.py", line 99, in wrapper
    sys.exit(recipe_main(conf))
             ^^^^^^^^^^^^^^^^^
  File "/home/salman/torchtune/recipes/eleuther_eval.py", line 299, in recipe_main
    recipe.evaluate()
  File "/home/salman/.pyenv/versions/3.11.9/envs/tune/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/salman/torchtune/recipes/eleuther_eval.py", line 281, in evaluate
    output = evaluate(
             ^^^^^^^^^
  File "/home/salman/.pyenv/versions/3.11.9/envs/tune/lib/python3.11/site-packages/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/salman/.pyenv/versions/3.11.9/envs/tune/lib/python3.11/site-packages/lm_eval/evaluator.py", line 476, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/salman/.pyenv/versions/3.11.9/envs/tune/lib/python3.11/site-packages/lm_eval/models/huggingface.py", line 1279, in generate_until
    cont = self._model_generate(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/salman/torchtune/recipes/eleuther_eval.py", line 147, in _model_generate
    toks, _ = generation.generate(
              ^^^^^^^^^^^^^^^^^^^^
  File "/home/salman/.pyenv/versions/3.11.9/envs/tune/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/salman/torchtune/torchtune/generation/_generation.py", line 312, in generate
    tokens, generated_logits = generate_next_token(
                               ^^^^^^^^^^^^^^^^^^^^
  File "/home/salman/torchtune/torchtune/generation/_generation.py", line 102, in generate_next_token
    logits = model(x, input_pos=input_pos, mask=mask)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/salman/.pyenv/versions/3.11.9/envs/tune/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1716, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/salman/.pyenv/versions/3.11.9/envs/tune/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1727, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/salman/torchtune/torchtune/modules/transformer.py", line 591, in forward
    self._validate_inputs(
  File "/home/salman/torchtune/torchtune/modules/transformer.py", line 498, in _validate_inputs
    raise ValueError(
ValueError: seq_len (3121) of input tensor should be smaller than max_seq_len (2048)
```
cc @joecummings 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Eval recipe not using max_seq_length #1644

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Eval recipe not using max_seq_length #1644

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions