Skip to content

Conversation

SAKURA-CAT
Copy link
Member

@SAKURA-CAT SAKURA-CAT commented Jun 6, 2025

更新了什么

本pr实现了千呼万唤的 sync 功能,这将在每次实验运行时在本地保存备份文件(夹)。备份文件夹的结构与之前的local模式完全一致,并新增一个 backup.swanlab 文件用于完整描述本次实验的全过程。

使用 LevelDB 协议保存日志

LevelDB 协议是一种高效的、写入安全的、不可变的数据结构,这完全适合我们用于记录实验日志,本次更新引入了这部分设计,用于实验日志的备份。

我们将相关数据保存至 backup.swanlab 文件中,考虑到写入性能,写入操作将在单独线程完成。

定义日志备份协议(v0)版本

swanlab 使用 RESTFUL API 协议上传数据,针对此协议我们基于Pydantic定义了一套数据模型,用于备份上传的数据。
需要注意的是,由于上传的数据格式为JSON,这可能与LevelDB 协议并不完全适合——因为后者基于字节流。这将是未来的一个优化方向,我们将基于protobuf构建更加高效的数据存储方式。

新的功能:sync

本次更新增加了与wandb设计类似的sync功能,允许用户选择对应的run文件夹上传实验日志,使用方式分为两种:

  1. 命令行方式同步

    swanlab sync /the/path/to/run-xxxxxxxx_xxxxxx-xxxx
  2. 代码内同步

    from swanlab import sync
    # 需要先登录
    swanlab.login(api_key="xxxxxxxx")
    # 同步日志
    sync("/the/path/to/run-xxxxxxxx_xxxxxx-xxxx")

⚠️ 需要注意的是,目前由于设计局限,每次sync都将创建新的文件夹

此外,实验备份功能可通过swanlab.Settings(backup=False)关闭

新的模式:offline

此模式用于用户训练时无法联网的情况,此模式下swanlab不会尝试发起网络请求,而是利用本次更新的日志存储协议保存实验数据,用户可下载对应的run文件夹并使用sync功能上传实验日志。

⚠️ 需要注意的是,此模式下swanlab.Settings(backup=False)无效,备份功能强制打开。

重构部分函数逻辑

在本次更新中我们重构部分函数逻辑以适应 sync 功能的开发。他们包括以下部分:

  1. 修改cloud模式、local模式的相关回调逻辑,增加备份装饰器
  2. 优化部分代码,增加代码复用率
  3. 修改callback回调参数,删除before_init_experiment中表示实验数量的num参数
  4. 重构文件夹创建逻辑,local、offline、cloud都将允许创建文件夹
  5. 修复部分测试错误,在pytest测试中不再允许开启硬件监控

未来计划

sync 功能作为resume功能的前置功能之一,在本PR中基本开发完毕,在接下来,我们将优化以下几点:

  1. 每一次sync应在同一实验中完成同步
  2. 全新的上传进程,引入protobuf协议
  3. resume功能实现

closes: #1040
closes: #546
closes: #876
closes: #1043

* feat: sync init

* fix: file name

* feat: sync cli

* feat: logs footer
@SAKURA-CAT SAKURA-CAT added 📚 documentation Improvements or additions to documentation 💪 enhancement New feature or request labels Jun 6, 2025
@SAKURA-CAT SAKURA-CAT self-assigned this Jun 6, 2025
* refactor: change backup mode to offline

* fix: #1040

* fix: test error
@ShaohonChen ShaohonChen self-requested a review June 6, 2025 13:13
@SAKURA-CAT SAKURA-CAT force-pushed the feature/backup branch 4 times, most recently from a92be2b to 9193137 Compare June 7, 2025 06:55
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a backup feature for handling log storage and offline data management, while also unifying mode handling and adding a CLI sync command. Key changes include:

  • Adding backup functionality with new offline/backup mode support in environment settings and operator callbacks.
  • Updating mode-related logic and API signatures (e.g. _create_operator, mount_project) to correctly handle the backup/offline mode.
  • Introducing a new CLI command for log synchronization.

Reviewed Changes

Copilot reviewed 37 out of 37 changed files in this pull request and generated no comments.

Show a summary per file
File Description
swanlab/log/backup/init.py Introduces the backup module and exports backup-related APIs.
swanlab/env.py Updates allowed mode values, including the new "offline" mode.
swanlab/data/utils.py Modifies _create_operator to integrate login_info and backup usage.
swanlab/data/sdk.py Adjusts mode handling and associated documentation for new mode.
swanlab/data/run/main.py Adds swanlog_epoch and adapts finish/monitor procedures.
swanlab/data/callbacker/offline.py New OfflineCallback added for offline backup processing.
swanlab/cli/commands/sync/init.py Adds a sync command to enable local-to-cloud data synchronization.
swanlab/api/http.py Introduces history_exp_count and refines mount_project behavior.
swanlab/api/upload/model.py Refines media model initialization to support backup file paths.
Comments suppressed due to low confidence (4)

swanlab/data/sdk.py:138

  • The documentation now refers to the mode as 'backup' whereas internally the code checks for 'offline'. Consider unifying the naming (e.g. consistently using 'offline' or updating all references to 'backup') for clarity.
Allowed values are 'cloud', 'local', 'disabled', 'backup'.

swanlab/api/upload/model.py:128

  • [nitpick] The comment contains informal language and emojis; it is recommended to rephrase it in a more professional tone for clarity in the production code.
# -------------------------- 🤡这里是一点小小的💩 --------------------------

swanlab/api/http.py:251

  • [nitpick] The return statement that previously returned the project info has been removed. If this change is intentional, please update the method's documentation to reflect that mount_project no longer returns project information.
def mount_project(self, name: str, username: str = None, public: bool = None):

swanlab/data/callbacker/local.py:102

  • [nitpick] The FIXME comment is informal and uses emojis. It is advisable to clarify the intention and document the behavior in a professional manner or remove the comment if no longer relevant.
#  FIXME num 在 dashboard 中被要求传递但是没用上 🤡

@SAKURA-CAT SAKURA-CAT changed the title Feature/backup Feature/sync Jun 7, 2025
@SAKURA-CAT
Copy link
Member Author

@ShaohonChen 关于记录校验,可参考此测试:https://github.com/SwanHubX/SwanLab/pull/1047/files#diff-5d60817936c1a600474561f5e3fb5d322cdeed52a6090aa55f9673b99a45f3a3

需要注意的是,accelerate 不一定使用pytest测试,为了避免部分错误,建议关闭硬件监控:

    run = swanlab.init(
        project=project_name,
        experiment_name=experiment_name,
        mode="offline",
        config=config,
        description=description,
        tags=tags,
        settings=swanlab.Settings(hardware_monitor=False)
    )

@SAKURA-CAT SAKURA-CAT marked this pull request as ready for review June 8, 2025 09:28
@SAKURA-CAT SAKURA-CAT merged commit 05caca2 into main Jun 8, 2025
5 checks passed
@SAKURA-CAT SAKURA-CAT deleted the feature/backup branch June 8, 2025 09:28
@mjwen
Copy link

mjwen commented Jun 8, 2025

Awesome. thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📚 documentation Improvements or additions to documentation 💪 enhancement New feature or request
Projects
None yet
3 participants