Skip to content

Conversation

0x404
Copy link
Collaborator

@0x404 0x404 commented Jun 28, 2025

What does this PR do?

This PR add missing doc changes in #2125:

  • Synchronize checkpoint content and verl.model_merger with the latest code
  • Add content on how to merge checkpoints in the quick start documentation to help users understand how to merge checkpoints

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: `[BREAKING][fsdp, megatron] feat: dynamic batching

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

0x404 added 2 commits June 28, 2025 13:42
- Added a reference link to the checkpoint page for better navigation.
- Included the `fsdp_config.json` file in the checkpoint structure for clarity.
- Updated the `merge` command usage to include the `--use_cpu_initialization` option and clarified the local directory argument.
- Revised links to the Hugging Face and ModelScope documentation for accuracy.
@0x404
Copy link
Collaborator Author

0x404 commented Jun 28, 2025

cc @ETOgaosion ~

@0x404 0x404 changed the title [doc] chore: enhance checkpoint related docs [ckpt, doc] chore: enhance checkpoint related docs Jun 29, 2025
@ETOgaosion
Copy link
Collaborator

ETOgaosion commented Jun 30, 2025

Hi @0x404 , as I mentioned in #2125, can you help still keep the model_merger in the scripts folder in your PR #2251 ? But named as legacy_model_merger and explain this in the doc please? This may work as a backward compatibility?

@0x404
Copy link
Collaborator Author

0x404 commented Jun 30, 2025

Hi @0x404 , as I mentioned in #2125, can you help still keep the model_merger in the scripts folder in your PR #2251 ? But named as legacy_model_merger and explain this in the doc please? This may work as a backward compatibility?

Sounds good! I'll add the legacy_model_merger back to the scripts folder and update the documentation to explain the deprecation plan. We can keep it for a while before removing it completely.

@0x404 0x404 changed the title [ckpt, doc] chore: enhance checkpoint related docs [ckpt, doc] chore: add backward compatibility for model merger and sync docs Jun 30, 2025
@0x404
Copy link
Collaborator Author

0x404 commented Jun 30, 2025

@ETOgaosion could you re-run the CI, two CI failed but seems unrelated to this PR.

@ETOgaosion ETOgaosion merged commit 024a8b8 into volcengine:main Jun 30, 2025
37 of 39 checks passed
oseyosey pushed a commit to oseyosey/verl that referenced this pull request Jul 28, 2025
…nc docs (volcengine#2251)

### What does this PR do?

This PR add missing doc changes in
volcengine#2125:
- Synchronize checkpoint content and verl.model_merger with the latest
code
- Add content on how to merge checkpoints in the quick start
documentation to help users understand how to merge checkpoints

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
Juniper1021 pushed a commit to Juniper1021/verl that referenced this pull request Aug 7, 2025
…nc docs (volcengine#2251)

### What does this PR do?

This PR add missing doc changes in
volcengine#2125:
- Synchronize checkpoint content and verl.model_merger with the latest
code
- Add content on how to merge checkpoints in the quick start
documentation to help users understand how to merge checkpoints

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
whatadayG pushed a commit to whatadayG/verl that referenced this pull request Sep 5, 2025
…nc docs (volcengine#2251)

### What does this PR do?

This PR add missing doc changes in
volcengine#2125:
- Synchronize checkpoint content and verl.model_merger with the latest
code
- Add content on how to merge checkpoints in the quick start
documentation to help users understand how to merge checkpoints

### Checklist Before Starting

- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants