Skip to content

Conversation

xming521
Copy link
Owner

@xming521 xming521 commented Jun 19, 2025

  • 更新torch版本至2.7.0,vllm版本到0.9.1,离线推理改为chat方式调用
  • 添加test_model_args and vllm_args配置项,允许自定义测试集文件
  • CLI中添加配置文件路径选项,支持设置WECLONE_CONFIG_PATH环境变量
  • 更新数据清理策略中的max_new_tokens和enable_thinking参数以优化推理过程
  • 部分功能适配qwen3

fix #158 fix #83 fix #77 fix #69

@xming521 xming521 requested a review from Copilot June 19, 2025 08:54
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for VLLM and test-model configurations across the codebase, extends the CLI with a custom --config-path option, refactors the testing suite to run the full pipeline against multiple config files, and bumps the project and config versions.

  • Introduce VllmArgs and TestModelArgs in config_models.py and wire them into WcConfig and create_config_by_arg_type
  • Update offline_infer.vllm_infer signature and ChatModel initialization in api_service.py
  • Refactor tests/test_full_pipeV2.py to parameterize tests over all configs, replacing the removed test_full_pipe.py

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
weclone/utils/config_models.py Add VllmArgs and TestModelArgs, comment out deprecated enum items
weclone/utils/config.py Handle "vllm" and "test_model" in create_config_by_arg_type
weclone/server/api_service.py Pass a JSON dict to ChatModel via model_dump
weclone/prompts/clean_data.py Fix markdown list numbering
weclone/eval/test_model.py Use TestModelArgs.test_data_path and completion_config
weclone/data/qa_generator.py Replace QaPairV2 with QaPair
weclone/data/models.py Rename QaPairV2 to QaPair
weclone/data/clean/strategies.py Update type hints from QaPairV2 to QaPair
weclone/core/inference/offline_infer.py Extend vllm_infer and inject VllmArgs
weclone/cli.py Add --config-path option (removed @click.group())
tests/test_full_pipeV2.py Restructure and parameterize full-pipeline tests
tests/configs/*.jsonc Bump versions, add test_model_args and vision_api sections
settings.template.jsonc & pyproject.toml Version bumps and dependency updates
README.md Added a note about model-size behavior
Comments suppressed due to low confidence (7)

weclone/cli.py:61

  • The @click.group() decorator was removed, so no CLI commands will be registered. Re-add @click.group() (and apply the option) to ensure subcommands load correctly.
@click.option("--config-path", default=None, help="指定配置文件路径,会设置WECLONE_CONFIG_PATH环境变量")

weclone/prompts/clean_data.py:15

  • The bullet starts with +3. which breaks standard markdown numbering. Change to 3. to restore correct formatting.
3. **风格代表性**  (Style Representativeness): 评估【回答 A】是否展现了自然、独特的人类对话风格特征。它是否仅仅是功能性的信息传递,还是带有个性化的色彩?关注点包括但不限于:是否体现了特定的语气(如友好、幽默、不耐烦、正式、脏话),是否包含口头禅、俚语、网络用语(如“yyds”、“绝绝子”)、表情符号 Emoji、颜文字、标点符号的特殊使用如“!!!”、“???”、“~”等表达、特定的缩写或短语、非标准的但一致的表达方式(如方言词汇、个人口癖)?如果包含请给予5分,不包含给予5分以下分数

README.md:45

  • [nitpick] This note is unprofessional and may confuse readers. Consider revising or removing to maintain a professional project tone.
> - 7B模型很容易训练成为二傻子,14B模型勉强可以交流,32B及以上的模型效果会更好。   

weclone/utils/config_models.py:126

  • Using Field(VisionApiConfig()) shares one default instance across all models. Switch back to default_factory=VisionApiConfig to ensure each MakeDatasetArgs gets its own VisionApiConfig.
    vision_api: VisionApiConfig = Field(VisionApiConfig())

weclone/utils/config_models.py:182

  • Providing a mutable default model instance can cause shared-state bugs. Use default_factory=VllmArgs instead of a direct instance.
    vllm_args: VllmArgs = Field(VllmArgs())

weclone/utils/config_models.py:183

  • Similarly, use default_factory=TestModelArgs to avoid sharing one instance of TestModelArgs across all WcConfig instances.
    test_model_args: TestModelArgs = Field(TestModelArgs())

weclone/core/inference/offline_infer.py:120

  • json_schema is not defined in this scope, causing a NameError. Either import or compute the schema or remove this field from extra_body.
    extra_body = {"guided_json": json_schema, "enable_thinking": False}

Comment on lines +61 to +62
# FULL = "full"
# FREEZE = "freeze"
Copy link
Preview

Copilot AI Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out enum members clutter the code. If these values are deprecated, remove them or add a proper deprecation notice instead of leaving commented code.

Suggested change
# FULL = "full"
# FREEZE = "freeze"

Copilot uses AI. Check for mistakes.

@xming521 xming521 merged commit dda773a into master Jun 19, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant