-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Refactoring data processing #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Introduced new `make_dataset_args` in `settings.json` to support dataset creation with a specified processing method. - Updated `load_config` function in `config.py` to handle the new `make_dataset` argument type, allowing for flexible configuration loading. - Improved file handling in `config.py` by specifying UTF-8 encoding when reading `settings.json`.
- Added `commentjson` to `requirements.txt` for improved JSON handling. - Enhanced `settings.json` with a new `prefer_comma` option for chat preferences. - Refactored `csv_to_json.py` to improve data processing logic and added comments for clarity. - Updated `qa_generator.py` to include data validation and filtering based on blocked words. - Modified `strategies.py` to clarify the time window parameter and renamed a strategy class for better understanding. - Changed JSON loading in `config.py` to use `commentjson` for better compatibility with comments in JSON files.
- Updated `settings.json` to include a new `conversation_strategy` and `time_window` for improved conversation handling. - Refactored `qa_generator.py` to initialize conversation strategies based on the new configuration options. - Modified `strategies.py` to improve type hints and clarify strategy implementations for conversation management.
…转换脚本;重构qa_generator.py
…or.py以支持新的消息处理策略,优化数据处理逻辑,更新测试用例以适应新功能。
…参数,更新qa_generator.py以支持新的消息处理逻辑,删除旧的测试文件test_old_csv_to_json copy.py并更新相关测试用例。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 12 out of 14 changed files in this pull request and generated 1 comment.
Files not reviewed (2)
- .cursor/rules/weclone-rules.mdc: Language not supported
- settings.json: Language not supported
make_dataset/qa_generator.py
Outdated
import pandas as pd | ||
import json | ||
|
||
current_dir = os.path.dirname(p=os.path.abspath(__file__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of the keyword argument 'p=' in os.path.dirname is invalid; please change it to use os.path.dirname(os.path.abspath(file)) instead.
current_dir = os.path.dirname(p=os.path.abspath(__file__)) | |
current_dir = os.path.dirname(os.path.abspath(__file__)) |
Copilot uses AI. Check for mistakes.
数据处理第一次重构