Skip to content

model eval error: NameError: name 'model_state_dict' is not defined #1776

@elfisworking

Description

@elfisworking

i use this commend to run evaluation

tune run eleuther_eval --config eleuther_evaluation \
>     tasks="[hellaswag, wikitext]" \
>     model._component_=torchtune.models.llama3.llama3_8b \
>     quantizer._component_=torchtune.training.quantization.Int8DynActInt4WeightQuantizer\
>     quantizer.groupsize=128 \
>     checkpointer._component_=torchtune.training.FullModelTorchTuneCheckpointer \
>     checkpointer.checkpoint_dir="/QAT/output/llama3-8B" \
>     checkpointer.output_dir="/QAT/output/llama3-8B" \
>     checkpointer.checkpoint_files=[meta_model_2-8da4w.pt] \
>     checkpointer.model_type=LLAMA3 \
>     tokenizer._component_=torchtune.models.llama3.llama3_tokenizer \
>     tokenizer.path=/QAT/Meta-Llama-3-8B/original/tokenizer.model

But i get this.

2024-10-09:08:30:57,790 INFO     [_logging.py:101] Running EleutherEvalRecipe with resolved config:

batch_size: 8
checkpointer:
  _component_: torchtune.training.FullModelTorchTuneCheckpointer
  checkpoint_dir: /QAT/output/llama3-8B
  checkpoint_files:
  - meta_model_2-8da4w.pt
  model_type: LLAMA3
  output_dir: /QAT/output/llama3-8B
device: cuda
dtype: bf16
enable_kv_cache: true
limit: null
max_seq_length: 4096
model:
  _component_: torchtune.models.llama3.llama3_8b
quantizer:
  _component_: torchtune.training.quantization.Int8DynActInt4WeightQuantizer
  groupsize: 128
seed: 1234
tasks:
- hellaswag
- wikitext
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  max_seq_len: null
  path: /QAT/Meta-Llama-3-8B/original/tokenizer.model

Traceback (most recent call last):
  File "/usr/local/bin/tune", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/torchtune/_cli/tune.py", line 49, in main
    parser.run(args)
  File "/usr/local/lib/python3.10/dist-packages/torchtune/_cli/tune.py", line 43, in run
    args.func(args)
  File "/usr/local/lib/python3.10/dist-packages/torchtune/_cli/run.py", line 196, in _run_cmd
    self._run_single_device(args, is_builtin=is_builtin)
  File "/usr/local/lib/python3.10/dist-packages/torchtune/_cli/run.py", line 102, in _run_single_device
    runpy.run_path(str(args.recipe), run_name="__main__")
  File "/usr/lib/python3.10/runpy.py", line 289, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/usr/lib/python3.10/runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/recipes/eleuther_eval.py", line 576, in <module>
    sys.exit(recipe_main())
  File "/usr/local/lib/python3.10/dist-packages/torchtune/config/_parse.py", line 99, in wrapper
    sys.exit(recipe_main(conf))
  File "/usr/local/lib/python3.10/dist-packages/recipes/eleuther_eval.py", line 571, in recipe_main
    recipe.setup(cfg=cfg)
  File "/usr/local/lib/python3.10/dist-packages/recipes/eleuther_eval.py", line 494, in setup
    for k, v in model_state_dict.items():
NameError: name 'model_state_dict' is not defined

i read the code https://github.com/pytorch/torchtune/blob/main/recipes/eleuther_eval.py, i can not find where model_state_dict is defined. A bug ???
I have used this config file, but get same error

model:
  _component_: torchtune.models.llama3.llama3_8b

checkpointer:
  _component_: torchtune.training.FullModelTorchTuneCheckpointer
  checkpoint_dir: /QAT/output/llama3-8B/
  checkpoint_files: [
    meta_model_2-8da4w.pt
  ]
  output_dir: /QAT/output/llama3-8B/
  model_type: LLAMA3

# Tokenizer
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  path: /QAT/Meta-Llama-3-8B/original/tokenizer.model
  max_seq_len: null

# Environment
device: cuda
dtype: bf16
seed: 42 # It is not recommended to change this seed, b/c it matches EleutherAI's default seed

# EleutherAI specific eval args
tasks: ["hellaswag"]
limit: null
max_seq_length: 8192
batch_size: 8

# Quantization specific args
quantizer:
  _component_: torchtune.training.quantization.Int8DynActInt4WeightQuantizer
  groupsize: 256

Anyone can helps, thanks very much !!!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions