Skip to content

Conversation

BAIKEMARK
Copy link
Collaborator

@BAIKEMARK BAIKEMARK commented Jun 12, 2025

Add ImageToText

- 新增 VisionApiConfig 类用于配置视觉 API
- 在数据处理中集成图像识别功能,支持并行处理
- 重构数据清洗策略,支持在线和离线两种方式- 优化数据清洗流程,提高可扩展性和可维护性
@BAIKEMARK BAIKEMARK requested a review from Copilot June 12, 2025 15:30
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR integrates a Vision API for image recognition and refactors the data cleaning strategy to support multi-modal datasets, including updates to configuration, processing, and template files.

  • Introduced VisionApiConfig and updated dataset configurations to conditionally switch data sources based on image recognition.
  • Added ImageToTextProcessor to process images via an external API with parallel execution, and refactored cleaning strategies.
  • Updated documentation and sample configuration files to reflect the new multi-modal processing options.

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
weclone/utils/config_models.py Added VisionApiConfig and integrated vision_api in MakeDatasetArgs
weclone/utils/configV2.py Passed vision_api config into WCTrainSftConfig
weclone/utils/config.py Adjusted dataset selection based on vision_api enable flag
weclone/train/train_sft.py Updated cleaning strategy usage and dynamic dataset name update
weclone/prompts/clean_data.py Modified instructions for evaluating chat quality with style criteria
weclone/data/utils.py Added ImageToTextProcessor for image-to-text conversion with retry logic
weclone/data/qa_generatorV2.py Integrated image processing in parallel for QA generation
weclone/data/clean/strategies.py Refactored cleaning strategies and consolidated online cleaning logic
settings.template.jsonc & examples/mllm.template.jsonc Added vision_api configuration parameters
dataset/res_csv/sft/dataset_info.json Added dataset info for the cleaned chat-sft dataset
README.md Updated documentation to describe multi-modal training and data completion using vision_api
Comments suppressed due to low confidence (2)

weclone/data/utils.py:63

  • The _encode_image_to_base64 method returns None on failure but is documented to return a string. Update the return type annotation to Optional[str] to accurately reflect possible outcomes.
return None

weclone/data/clean/strategies.py:159

  • The class name 'OlineLLMCleaningStrategy' appears to contain a typo. Consider renaming it to 'OnlineLLMCleaningStrategy' for clarity and consistency.
class OlineLLMCleaningStrategy(CleaningStrategy):

f"{self.api_url}/chat/completions", headers=headers, json=payload, timeout=60
)
if response.status_code == 200:
pass
Copy link
Preview

Copilot AI Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The branch for status code 200 currently only has a 'pass' statement. Consider removing 'pass' and adding a comment to clarify that the response is valid and processing continues.

Suggested change
pass
# Response is valid; processing continues below.

Copilot uses AI. Check for mistakes.

@xming521 xming521 merged commit bca3240 into xming521:master Jun 13, 2025
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants