-
Notifications
You must be signed in to change notification settings - Fork 142
Feature/sync #1047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/sync #1047
Conversation
* feat: sync init * fix: file name * feat: sync cli * feat: logs footer
d9d1f94
to
edfd24f
Compare
* refactor: change backup mode to offline * fix: #1040 * fix: test error
a92be2b
to
9193137
Compare
9193137
to
d51ee99
Compare
88f3ffe
to
02be635
Compare
02be635
to
a0560d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a backup feature for handling log storage and offline data management, while also unifying mode handling and adding a CLI sync command. Key changes include:
- Adding backup functionality with new offline/backup mode support in environment settings and operator callbacks.
- Updating mode-related logic and API signatures (e.g. _create_operator, mount_project) to correctly handle the backup/offline mode.
- Introducing a new CLI command for log synchronization.
Reviewed Changes
Copilot reviewed 37 out of 37 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
swanlab/log/backup/init.py | Introduces the backup module and exports backup-related APIs. |
swanlab/env.py | Updates allowed mode values, including the new "offline" mode. |
swanlab/data/utils.py | Modifies _create_operator to integrate login_info and backup usage. |
swanlab/data/sdk.py | Adjusts mode handling and associated documentation for new mode. |
swanlab/data/run/main.py | Adds swanlog_epoch and adapts finish/monitor procedures. |
swanlab/data/callbacker/offline.py | New OfflineCallback added for offline backup processing. |
swanlab/cli/commands/sync/init.py | Adds a sync command to enable local-to-cloud data synchronization. |
swanlab/api/http.py | Introduces history_exp_count and refines mount_project behavior. |
swanlab/api/upload/model.py | Refines media model initialization to support backup file paths. |
Comments suppressed due to low confidence (4)
swanlab/data/sdk.py:138
- The documentation now refers to the mode as 'backup' whereas internally the code checks for 'offline'. Consider unifying the naming (e.g. consistently using 'offline' or updating all references to 'backup') for clarity.
Allowed values are 'cloud', 'local', 'disabled', 'backup'.
swanlab/api/upload/model.py:128
- [nitpick] The comment contains informal language and emojis; it is recommended to rephrase it in a more professional tone for clarity in the production code.
# -------------------------- 🤡这里是一点小小的💩 --------------------------
swanlab/api/http.py:251
- [nitpick] The return statement that previously returned the project info has been removed. If this change is intentional, please update the method's documentation to reflect that mount_project no longer returns project information.
def mount_project(self, name: str, username: str = None, public: bool = None):
swanlab/data/callbacker/local.py:102
- [nitpick] The FIXME comment is informal and uses emojis. It is advisable to clarify the intention and document the behavior in a professional manner or remove the comment if no longer relevant.
# FIXME num 在 dashboard 中被要求传递但是没用上 🤡
@ShaohonChen 关于记录校验,可参考此测试:https://github.com/SwanHubX/SwanLab/pull/1047/files#diff-5d60817936c1a600474561f5e3fb5d322cdeed52a6090aa55f9673b99a45f3a3 需要注意的是,accelerate 不一定使用pytest测试,为了避免部分错误,建议关闭硬件监控: run = swanlab.init(
project=project_name,
experiment_name=experiment_name,
mode="offline",
config=config,
description=description,
tags=tags,
settings=swanlab.Settings(hardware_monitor=False)
) |
Awesome. thanks! |
更新了什么
本pr实现了千呼万唤的
sync
功能,这将在每次实验运行时在本地保存备份文件(夹)。备份文件夹的结构与之前的local模式完全一致,并新增一个backup.swanlab
文件用于完整描述本次实验的全过程。使用 LevelDB 协议保存日志
LevelDB 协议是一种高效的、写入安全的、不可变的数据结构,这完全适合我们用于记录实验日志,本次更新引入了这部分设计,用于实验日志的备份。
我们将相关数据保存至
backup.swanlab
文件中,考虑到写入性能,写入操作将在单独线程完成。定义日志备份协议(v0)版本
swanlab 使用 RESTFUL API 协议上传数据,针对此协议我们基于Pydantic定义了一套数据模型,用于备份上传的数据。
需要注意的是,由于上传的数据格式为JSON,这可能与LevelDB 协议并不完全适合——因为后者基于字节流。这将是未来的一个优化方向,我们将基于protobuf构建更加高效的数据存储方式。
新的功能:sync
本次更新增加了与
wandb
设计类似的sync
功能,允许用户选择对应的run文件夹上传实验日志,使用方式分为两种:命令行方式同步
代码内同步
此外,实验备份功能可通过
swanlab.Settings(backup=False)
关闭新的模式:offline
此模式用于用户训练时无法联网的情况,此模式下swanlab不会尝试发起网络请求,而是利用本次更新的日志存储协议保存实验数据,用户可下载对应的run文件夹并使用sync功能上传实验日志。
重构部分函数逻辑
在本次更新中我们重构部分函数逻辑以适应
sync
功能的开发。他们包括以下部分:before_init_experiment
中表示实验数量的num
参数未来计划
sync 功能作为
resume
功能的前置功能之一,在本PR中基本开发完毕,在接下来,我们将优化以下几点:closes: #1040
closes: #546
closes: #876
closes: #1043