Skip to content

[Project] deepseek R1 infrastructure #708

@eric-haibin-lin

Description

@eric-haibin-lin

The verl team like to invite the community to help contribute to build infrastructure to efficiently scale up to models of deepseek scale and develop recipes to reproduce deepseek r1 results for the broader open source and research community. You're encouraged to join the slack channel for discussions on sub-topics.

Evals

Tasks: https://github.com/volcengine/verl/pull/777/files

  • add evaluation script to reproduce ds-r1 results on several benchmarks from ds-r1 checkpoint
    • gpqa diamond (english)
    • LiveCodeBench (code)
    • SWE-bench Verified (code)
    • CNMO 2024 (math)

Notes:

  • refer to examples/data_preprocess for data preprocessing examples
  • refer to trainer/main_generation.py for generation (but evaluation code is missing)
  • if possible, upload preprocessed dataset to huggingface
  • start with ds distilled models for quick verification before scaling up to 671b

Training engine

Tasks:

  • verify GPTModel with mcore v0.11. There’s experimental integration of critic and actor models with GPTModel class
    • verify checkpoint manager with GPTModel
    • Investigate the issue of convergence issue of seq packing when micro bsz > 1 @GAIR-NLP
      Mcore ds-v3 perf tuning
    • Run ds-v3 with mcore on a range of seqlen and GPU count, tune nd parallel / memory configs
  • Mcore ds-v3 convergence verification: to ensure the correctness of mcore ds-v3 implementation, we shall run a smaller version of ds with same model architecture.

Notes:

  • To make the perf tuning more faithful, it’s better to develop a benchmark script that include alignment loss (CE loss, CE and entropy loss)
  • GAIR NLP team is looking GPTModel with sequence packing support when micro_bsz > 1

Data & recipe

Tasks:
Math and scientific sources:

  • Further improve the dataset used for math RL, maybe based on DAPO 17k and other open source ones
    Code:
  • Build a baseline code recipe in verl main repo, using small models such as llama or qwen-7b
  • Curate dataset for code RL training (start with existing open source ones)

Notes:

Inference Engine

Tasks

  • verify multi-node TP inference
  • support multi-node EP/PP inference
  • sharding manager support with mcore v0.11 + latest version of inference engines

Related TODOs: #825

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions