-
Notifications
You must be signed in to change notification settings - Fork 647
[ci] react daily test #1668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ci] react daily test #1668
Conversation
zhulinJulia24
commented
Nov 8, 2024
- react daily test
- add fullbench test based on v1.1 (exclude scicode because it takes too long and maybe oom)
- add subjective testcases
|
||
models = sorted(models, key=lambda x: x['run_cfg']['num_gpus']) | ||
|
||
judge_models = deepcopy([models[1]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using this model here?
from opencompass.configs.models.hf_internlm.hf_internlm2_5_7b_chat import \ | ||
models as hf_internlm2_5_7b_chat_model # noqa: F401, E501 | ||
from opencompass.configs.models.hf_internlm.lmdeploy_internlm2_5_7b_chat import \ | ||
models as lmdeploy_internlm2_5_7b_chat_model # noqa: F401, E501 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we only test 2 models here? Is this just for function check of subjective evaluation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we only test 2 models here? Is this just for function check of subjective evaluation?
yes. use subset datasets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* updaste * update * update * update * update * update * update * update * update * update * updaste * update * update * refactor summarize * update * update * update * update * update * updaste * update * update * update * update * updaste * update * update * update * update * update * updaste * updaste * update * update * update * update * update * update * update * update * update * update * update * Update daily-run-test.yml * Update daily-run-test.yml * update * update * update * update * update * Update daily-run-test.yml * update * update * update * update * update * update * update * update * update * update * update * Update daily-run-test.yml * Update daily-run-test.yml * update * update * Update daily-run-test.yml * update * update * update --------- Co-authored-by: zhulin1 <zhulin1@pjlab.org.cn>