Skip to content

Rewrite the entire settings-related functionality using pydantic, and add an image modality test script. #153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jun 12, 2025

Conversation

xming521
Copy link
Owner

@xming521 xming521 commented Jun 11, 2025

  • Rewrite the entire settings-related functionality using pydantic
  • add an image modality test script
  • Unified dataset : chat-sft.
  • Pure text model fine-tuning data switched to ShareGPT format, defaulting to carrying chat history context
  • Upgrade dependencies to support qwen3

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

@xming521 xming521 requested a review from Copilot June 12, 2025 02:15
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR migrates settings handling to Pydantic models, updates dataset generation to ShareGPT format (including image modality), and adds end-to-end test scripts for the new pipeline.

  • Switched configuration loading to weclone.utils.config_models and replaced dictionary access with model attributes.
  • Refactored code formatting (multi-line prints, # nosec annotations) and replaced assertions with explicit errors.
  • Added tests/test_full_pipeV2.py for image modality and updated related test and example config files.

Reviewed Changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated no comments.

Show a summary per file
File Description
weclone/data/qa_generatorV2.py Migrated to Pydantic config; updated attribute access
weclone/cli.py Changed to use CliArgs model; simplified config checks
tests/test_full_pipeV2.py New full-pipeline tests supporting image modality
tests/test_full_pipe.py Updated to mirror V2 tests (missing import issue)
settings.template.jsonc Bumped template version to 0.2.23; updated dataset name
README.md Modified markdown formatting and tips
pyproject.toml Bumped project version; added omegaconf dependency
Comments suppressed due to low confidence (4)

README.md:75

  • [nitpick] Using -> [!TIP] with arrows may not render as a Markdown admonition; consider using a standard blockquote or HTML comment structure.
-> [!TIP]

README.md:214

  • The bullet list syntax includes a leading + (+- [ ]), which breaks Markdown list rendering. Replace with a standard list item like - [ ] 支持多模态: 已支持图片.
+- [ ] 支持多模态:已支持图片

tests/test_full_pipe.py:98

  • The test uses cast(...) but there is no from typing import cast import at the top, causing a NameError at runtime.
config: WCMakeDatasetConfig = cast(WCMakeDatasetConfig, load_config("make_dataset"))

pyproject.toml:12

  • [nitpick] The omegaconf dependency was added but doesn't appear to be used in the codebase. Consider removing it if it’s not needed to reduce maintenance overhead.
"omegaconf",

cursor[bot]

This comment was marked as outdated.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing Test Directory Causes File Error

The test_cli_make_dataset function attempts to copy image files from tests/tests_data/images. This directory is not created or present in the test setup, unlike tests/tests_data/test_person which is used for other test data. This will cause a FileNotFoundError when os.listdir is called on the non-existent images directory.

tests/test_full_pipeV2.py#L102-L104

os.makedirs(os.path.join(config.media_dir, "images"), exist_ok=True)
for file in os.listdir(os.path.join(PROJECT_ROOT_DIR, "tests", "tests_data", "images")):
shutil.copy(os.path.join(PROJECT_ROOT_DIR, "tests", "tests_data", "images", file), os.path.join(config.media_dir, "images", file))

Fix in Cursor


Was this report helpful? Give feedback by reacting with 👍 or 👎

@xming521 xming521 merged commit a1afe78 into master Jun 12, 2025
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant