Skip to content

Conversation

joecummings
Copy link
Member

@joecummings joecummings commented Oct 10, 2024

Context:

In v0.4.5, EleutherAI officially added multimodal support to their Eval Harness. We had some hacky code to make sure the user was doing this on their main Github code before this release was public. Now that it is, we should just force people to upgrade their Harness version so that our multimodal code works.

In addition, I changed the way we gate external packages in our recipes. If they don't have lm-eval installed AT ALL, it will throw the usual ModuleNotFound error. This should be sufficient for them to understand that they need this to run the recipe. In addition, b/c we actually want them to use the latest version, we now have another check in the setup part of the workflow to make sure that they are installing v0.4.5 specifically. This happens before any heavy lifting starts.

Changelog:

  • Removed hacky code
  • Updated workers to download lm-eval==0.4.5
  • Added gating logic on version in the setup portion of the recipe
  • Updated the test to return the "wrong" version of lm-eval in order to ensure we hit the right test

Testing:
Recipe tests: python -m pytest tests/recipes/test_eleuther_eval.py --with-integration
Output from text run:

batch_size: 8
checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: /tmp/Llama-2-7b-hf
  checkpoint_files:
  - pytorch_model-00001-of-00002.bin
  - pytorch_model-00002-of-00002.bin
  model_type: LLAMA2
  output_dir: /tmp/Llama-2-7b-hf
device: cuda
dtype: bf16
enable_kv_cache: true
limit: null
max_seq_length: 4096
model:
  _component_: torchtune.models.llama2.llama2_7b
quantizer: null
seed: 1234
tasks:
- truthfulqa_mc2
tokenizer:
  _component_: torchtune.models.llama2.llama2_tokenizer
  max_seq_len: null
  path: /tmp/Llama-2-7b-hf/tokenizer.model

Model is initialized with precision torch.bfloat16.
2024-10-14:08:58:59,426 INFO     [eleuther_eval.py:505] Model is initialized with precision torch.bfloat16.
2024-10-14:08:58:59,488 INFO     [huggingface.py:129] Using device 'cuda:0'
2024-10-14:08:58:59,607 INFO     [huggingface.py:481] Using model type 'default'
2024-10-14:08:58:59,843 INFO     [huggingface.py:365] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda:0'}
Running evaluation on the following tasks: ['truthfulqa_mc2']
2024-10-14:08:59:12,366 INFO     [eleuther_eval.py:549] Running evaluation on the following tasks: ['truthfulqa_mc2']
2024-10-14:08:59:12,367 INFO     [task.py:415] Building contexts for truthfulqa_mc2 on rank 0...
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 817/817 [00:01<00:00, 700.97it/s]
2024-10-14:08:59:13,605 INFO     [evaluator.py:489] Running loglikelihood requests
Running loglikelihood requests: 100%|██████████████████████████████████████████████████████████████| 5882/5882 [01:52<00:00, 52.11it/s]
Eval completed in 119.48 seconds.
2024-10-14:09:01:11,843 INFO     [eleuther_eval.py:558] Eval completed in 119.48 seconds.
Max memory allocated: 39.48 GB
2024-10-14:09:01:11,843 INFO     [eleuther_eval.py:559] Max memory allocated: 39.48 GB


|    Tasks     |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|--------------|------:|------|-----:|------|---|-----:|---|-----:|
|truthfulqa_mc2|      2|none  |     0|acc   |↑  |0.3895|±  |0.0136|


2024-10-14:09:01:11,926 INFO     [eleuther_eval.py:563]

|    Tasks     |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|--------------|------:|------|-----:|------|---|-----:|---|-----:|
|truthfulqa_mc2|      2|none  |     0|acc   |↑  |0.3895|±  |0.0136|


Output from multimodal run:

2024-10-14:09:05:40,454 INFO     [_logging.py:101] Running EleutherEvalRecipe with resolved config:

batch_size: 1
checkpointer:
  _component_: torchtune.training.FullModelMetaCheckpointer
  checkpoint_dir: /tmp/Llama-3.2-11B-Vision-Instruct/original
  checkpoint_files:
  - consolidated.pth
  model_type: LLAMA3_VISION
  output_dir: ./
device: cuda
dtype: bf16
enable_kv_cache: true
limit: 3
log_level: INFO
max_seq_length: 8192
model:
  _component_: torchtune.models.llama3_2_vision.llama3_2_vision_11b
quantizer: null
seed: 1234
tasks:
- mmmu_val_biology
tokenizer:
  _component_: torchtune.models.llama3_2_vision.llama3_2_vision_transform
  max_seq_len: 8192
  path: /tmp/Llama-3.2-11B-Vision-Instruct/original/tokenizer.model

Model is initialized with precision torch.bfloat16.
2024-10-14:09:05:49,232 INFO     [eleuther_eval.py:505] Model is initialized with precision torch.bfloat16.
Running evaluation on the following tasks: ['mmmu_val_biology']
2024-10-14:09:06:01,712 INFO     [eleuther_eval.py:549] Running evaluation on the following tasks: ['mmmu_val_biology']
2024-10-14:09:06:01,714 INFO     [task.py:415] Building contexts for mmmu_val_biology on rank 0...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 5706.54it/s]
2024-10-14:09:06:01,766 INFO     [evaluator.py:489] Running generate_until requests
Running generate_until requests with text+image input: 100%|█████████████████████████████████████████████| 3/3 [00:34<00:00, 11.61s/it]
Eval completed in 34.93 seconds.
2024-10-14:09:06:36,641 INFO     [eleuther_eval.py:558] Eval completed in 34.93 seconds.
Max memory allocated: 32.01 GB
2024-10-14:09:06:36,642 INFO     [eleuther_eval.py:559] Max memory allocated: 32.01 GB


| Tasks |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|-------|------:|------|------|------|---|-----:|---|-----:|
|Biology|      0|none  |None  |acc   |↑  |0.3333|±  |0.3333|


2024-10-14:09:06:36,723 INFO     [eleuther_eval.py:563]

| Tasks |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|-------|------:|------|------|------|---|-----:|---|-----:|
|Biology|      0|none  |None  |acc   |↑  |0.3333|±  |0.3333|

Copy link

pytorch-bot bot commented Oct 10, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1800

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 13a65f0 with merge base 5de5001 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 10, 2024
Comment on lines +16 to +20
from lm_eval.evaluator import evaluate, get_task_list
from lm_eval.models.hf_vlms import HFMultimodalLM
from lm_eval.models.huggingface import HFLM
from lm_eval.tasks import get_task_dict, TaskManager
from lm_eval.utils import make_table
Copy link
Contributor

@ebsmothers ebsmothers Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we still need to gate here? lm_eval is still not in our pyproject.toml. Or is the idea that we just directly raise the usual ModuleNotFoundError since we are now on the latest version of lm_eval? (Still seems like it might be good to explicitly say "install lm_eval==0.4.5" or whatever)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was going to default to just let the ModuleNotFoundError roll through since it seems explicit enough.

@joecummings joecummings changed the title [WIP] Update Eleuther to v0.4.5 Update EleutherAI Eval Harness to v0.4.5 Oct 14, 2024
@@ -469,6 +440,16 @@ class EleutherEvalRecipe(EvalRecipeInterface):
"""

def __init__(self, cfg: DictConfig) -> None:
# Double check we have the right Eval Harness version
from importlib.metadata import version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do they not have __version__ defined?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(joe-torchtune) [jrcummings@devvm050.nha0 ~/projects/joe-torchtune (update-eluther-pin)]$ python
Python 3.11.10 (main, Oct  3 2024, 07:29:13) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import lm_eval
>>> lm_eval.__version__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'lm_eval' has no attribute '__version__'

@joecummings joecummings merged commit 7bbaa89 into pytorch:main Oct 14, 2024
17 checks passed
@joecummings joecummings deleted the update-eluther-pin branch October 14, 2024 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants