feature: sync pro #1194

SAKURA-CAT · 2025-07-18T11:36:38Z

本 PR 完善了 sync 功能的逻辑，封装原本 resume 的代码为 Mounter （挂载器），可在 sync 和 init 部分复用。

描述

本PR允许sync功能结合断点续训一起使用，经典场景为：

用户使用 offline 功能，通过 nas 等网盘同步日志，此时可通过不断地sync上传实验指标信息
用户在训练时网络中断，通过sync功能上传未被上传的日志

以上场景在底层被认为是resume功能的复用，在代码实现上也是如此

关于测试

除了常规单元测试以外，在项目 test/sync 目录下新增纯python的sync测试和jupyter测试，具体可看文件注释

API

从产品设计角度出发，在执行 sync 时依旧创建新的实验，断点续训被认为是可选操作，为此我们新增 --id 参数，他有以下可选值：

None：默认行为，等价于 new
new：创建一个新的实验完成sync，等价于 resume=never
auto：使用日志文件中配置的实验id完成sync，等价于 resume=allow
str：其他字符串，则被认为是实验id，此时等价于 resume=must

注意事项

新版本日志与旧版本日志不兼容，这意味着新（旧）版本日志无法 sync 旧（新）版本日志文件
由于目前resume的技术限制，当 --id 为 auto 且不为新实验时，不会同步实验运行时间（这与目前resume逻辑一致）

日志文件兼容表

Log Version	SwanLab Version
0	0.6.2 ~ 0.6.7
1	0.6.8 ~ latest

文件兼容列表应该更新至官方文档，方便查询

closes: #1156

closes: #1136

Expanded the ValidationError exception docstring to include both backend token/api key validation failures and local log file integrity issues.

Replaced assertion with ValidationError for record checksum validation in DataStore. Updated DataPorter to handle ValidationError in parse method with a strict mode. Changed RunStore.run_colors type to Tuple. Added and refactored tests for DataStore validation, moving and expanding test coverage to test/unit/data/porter/test_datastore.py.

Relocated swanlab/data/formatter.py to swanlab/formatter.py and updated all import statements accordingly. Also moved the corresponding test file to match the new structure. This improves project organization by placing shared utilities at the root level.

Added --id and --resume options to the sync CLI command with input validation and error handling. Updated sync logic to support these options and refactored parameter names for clarity. Introduced unit tests to verify correct behavior and error cases for the new options.

Moved LogContent TypedDict from swanlab/log/type.py to swanlab/core_python/uploader/model.py for better modularity. Updated all relevant imports and usages to reference the new location, ensuring type consistency across modules.

Moved experiment mounting and sync logic in CloudPyCallback to use the new Mounter class. Added filter utility functions for metrics, columns, and epochs in swanlab.data.porter.utils, and updated DataPorter to use these for selective uploads. Added unit tests for the new filter utilities. Improved error message in DataStore for unsupported backup versions.

Assigns 'auto' to the id variable if the --resume flag is set, ensuring correct behavior when resuming a sync operation.

Refactored DataPorter to include experiment id and colors, updated parse logic to skip invalid records, and ensured experiment state is updated after synchronization. Mounter no longer handles cleanup and now sets run_colors only if not already set. The sync entrypoint now uses Mounter to set up run_store from parsed data. Updated proto models to require certain fields. Also renamed a test file for clarity and improved the UseMockRunState utility for more flexible test setup.

Replaced references to self.run_store with self._run_store in DataPorter to ensure correct attribute usage. Updated Mounter to handle None config values by defaulting to an empty dict when reverting config.

Added Jupyter notebooks and scripts under test/sync to test the synchronization feature, including run and sync workflows. Updated swanlab.sync.__init__.py to remove unused success state update after synchronization.

SAKURA-CAT · 2025-07-20T11:39:57Z

@Zeyi-Lin Ready

部分单测还未写，但是已经可以做端到端测试了

Copilot

Pull Request Overview

This PR enhances the sync functionality by introducing resumable training capabilities and refactoring the mount logic into a reusable Mounter component. The key changes enable sync to work with existing experiments and improve code reusability between sync and init operations.

Adds support for resuming existing experiments during sync operations with --resume and --id CLI options
Extracts mounting logic from the cloud callback into a new reusable Mounter class
Updates log file format version and improves data validation with better error handling

Reviewed Changes

Copilot reviewed 25 out of 26 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
swanlab/data/porter/mounter.py	New Mounter class that handles project/experiment mounting logic previously embedded in cloud callback
swanlab/cli/commands/sync/init.py	Enhanced sync CLI with resume functionality and parameter validation
swanlab/sync/init.py	Updated sync function to use new Mounter class and support experiment ID parameters
swanlab/data/porter/init.py	Refactored DataPorter to use filtering utilities and improved synchronization logic
swanlab/data/porter/datastore.py	Updated log file version and improved validation error handling
swanlab/toolkit/model.py	New LogContent TypedDict definition
test/unit/data/porter/	Comprehensive test coverage for new porter functionality

swanlab/data/porter/__init__.py

swanlab/data/store.py

swanlab/data/porter/__init__.py

test/unit/data/porter/test_datastore.py

swanlab/data/porter/__init__.py

Removed execution outputs and metadata from run.ipynb and sync.ipynb for cleaner version control. Added a README.md to the Jupyter sync test directory. Also added a missing return type annotation to filter_epoch in porter/utils.py.

Added comments to clarify the sequence of parameter generation and experiment mounting in the Mounter class. This improves code readability and maintainability.

Refactored the sync command to remove the --resume option and replace it with an --id option that accepts 'auto' for resuming runs. Updated parameter validation and login handling, and improved experiment ID and resume mode logic in the sync implementation. Adjusted tests and scripts to use the new --id 'auto' pattern and updated error messages for clarity.

Removed the return value from DataPorter.synchronize and updated its usage in swanlab.sync. The method now directly updates the client state based on the footer, improving clarity and reducing unnecessary return value propagation.

Updated run_colors assignment in swanlab.sync.__init__ to use only the first two elements of exp.colors. Removed unused or obsolete CLI sync tests from test_cli_sync.py.

Moved run store setup logic from swanlab/sync/__init__.py to a new utility function set_run_store in swanlab/sync/sync_utils.py for better modularity and reuse. Added comprehensive unit tests for set_run_store. Removed obsolete CLI sync test.

SAKURA-CAT self-assigned this Jul 18, 2025

SAKURA-CAT added the 💪 enhancement New feature or request label Jul 18, 2025

SAKURA-CAT force-pushed the feat/sync-pro branch from 9e7712f to 1b6046e Compare July 19, 2025 11:52

SAKURA-CAT added 9 commits July 20, 2025 18:29

Update ValidationError docstring for clarity

2c74071

Expanded the ValidationError exception docstring to include both backend token/api key validation failures and local log file integrity issues.

Refactor LogContent type and update imports

19d02d8

Moved LogContent TypedDict from swanlab/log/type.py to swanlab/core_python/uploader/model.py for better modularity. Updated all relevant imports and usages to reference the new location, ensuring type consistency across modules.

fix: circle import

70d6660

Set id to 'auto' when resume option is used

f0c0436

Assigns 'auto' to the id variable if the --resume flag is set, ensuring correct behavior when resuming a sync operation.

SAKURA-CAT force-pushed the feat/sync-pro branch from 8c10ede to d6f94d9 Compare July 20, 2025 10:32

SAKURA-CAT added 2 commits July 20, 2025 19:30

Fix attribute access and config handling in porter modules

28108a6

Replaced references to self.run_store with self._run_store in DataPorter to ensure correct attribute usage. Updated Mounter to handle None config values by defaulting to an empty dict when reverting config.

Add sync feature tests and minor sync logic update

f1828e1

Added Jupyter notebooks and scripts under test/sync to test the synchronization feature, including run and sync workflows. Updated swanlab.sync.__init__.py to remove unused success state update after synchronization.

SAKURA-CAT marked this pull request as ready for review July 20, 2025 11:39

SAKURA-CAT requested review from Zeyi-Lin and Copilot July 20, 2025 11:40

Copilot AI reviewed Jul 20, 2025

View reviewed changes

SAKURA-CAT added 4 commits July 20, 2025 19:46

Clean up Jupyter sync tests and add README

26444aa

Removed execution outputs and metadata from run.ipynb and sync.ipynb for cleaner version control. Added a README.md to the Jupyter sync test directory. Also added a missing return type annotation to filter_epoch in porter/utils.py.

Add comments for experiment mounting steps

72c036b

Added comments to clarify the sequence of parameter generation and experiment mounting in the Mounter class. This improves code readability and maintainability.

Refactor DataPorter.synchronize return value handling

43f3fe7

Removed the return value from DataPorter.synchronize and updated its usage in swanlab.sync. The method now directly updates the client state based on the footer, improving clarity and reducing unnecessary return value propagation.

SAKURA-CAT mentioned this pull request Jul 20, 2025

[BUG] swanlab sync同步offline logdir问题 #1136

Closed

SAKURA-CAT added 2 commits July 21, 2025 12:40

Fix run_colors assignment and remove unused sync CLI tests

b387b22

Updated run_colors assignment in swanlab.sync.__init__ to use only the first two elements of exp.colors. Removed unused or obsolete CLI sync tests from test_cli_sync.py.

Zeyi-Lin approved these changes Jul 25, 2025

View reviewed changes

SAKURA-CAT merged commit 40f04ec into main Jul 25, 2025
5 checks passed

SAKURA-CAT deleted the feat/sync-pro branch July 25, 2025 15:59

SAKURA-CAT mentioned this pull request Jul 25, 2025

[ADVICE] swanlab sync feature #1139

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feature: sync pro #1194

feature: sync pro #1194

Uh oh!

SAKURA-CAT commented Jul 18, 2025 •

edited

Loading

Uh oh!

SAKURA-CAT commented Jul 20, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feature: sync pro #1194

feature: sync pro #1194

Uh oh!

Conversation

SAKURA-CAT commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

描述

关于测试

API

注意事项

日志文件兼容表

Uh oh!

SAKURA-CAT commented Jul 20, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SAKURA-CAT commented Jul 18, 2025 •

edited

Loading