Skip to content

Conversation

merrymercy
Copy link
Contributor

@merrymercy merrymercy commented Nov 18, 2024

Turn on --enable-overlap-schedule by default. The overlap scheduler now works for most use cases except for the following todo items. Since the following items are not commonly used, we temporarily disable the overlap scheduler for these cases. They will be supported in the follow-up PRs.

Todo items:

  • Tune the performance of the overlap scheduler for the Triton backend and turn it on by default.
  • Support more penalties (e.g., frequency, repetition).
  • Support mixed-style chunked prefill.

@merrymercy merrymercy force-pushed the pr-enable-overlap branch 15 times, most recently from e9bc924 to 69e39c4 Compare November 19, 2024 11:15
@merrymercy merrymercy marked this pull request as ready for review November 19, 2024 12:01
@zhyncs
Copy link
Member

zhyncs commented Nov 19, 2024

ref https://github.com/sgl-project/sglang/actions/runs/11913207295
new gsm8k nightly eval for pr-enable-overlap

@zhyncs zhyncs mentioned this pull request Nov 19, 2024
3 tasks
@zhyncs
Copy link
Member

zhyncs commented Nov 19, 2024

I've verified locally on NVIDIA H100 instance and I think it's ok

[
  {
    "timestamp": "2024-11-19T07:39:22.417674",
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "metrics": {
      "en": 0.856,
      "en:std": 0.35108973211986705,
      "group_latin": 0.856,
      "group_latin:std": 0.35108973211986705,
      "score:std": 0.35108973211986705,
      "score": 0.856
    },
    "score": 0.856
  },
  {
    "timestamp": "2024-11-19T07:40:40.889926",
    "model": "mistralai/Mistral-7B-Instruct-v0.3",
    "metrics": {
      "en": 0.604,
      "en:std": 0.48906441293555597,
      "group_latin": 0.604,
      "group_latin:std": 0.48906441293555597,
      "score:std": 0.48906441293555597,
      "score": 0.604
    },
    "score": 0.604
  },
  {
    "timestamp": "2024-11-19T07:42:47.873303",
    "model": "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct",
    "metrics": {
      "en": 0.884,
      "en:std": 0.32022492095400695,
      "group_latin": 0.884,
      "group_latin:std": 0.32022492095400695,
      "score:std": 0.32022492095400695,
      "score": 0.884
    },
    "score": 0.884
  },
  {
    "timestamp": "2024-11-19T07:45:20.274593",
    "model": "google/gemma-2-27b-it",
    "metrics": {
      "en": 0.92,
      "en:std": 0.2712931993250107,
      "group_latin": 0.92,
      "group_latin:std": 0.2712931993250107,
      "score:std": 0.2712931993250107,
      "score": 0.92
    },
    "score": 0.92
  },
  {
    "timestamp": "2024-11-19T07:51:36.181534",
    "model": "meta-llama/Llama-3.1-70B-Instruct",
    "metrics": {
      "en": 0.972,
      "en:std": 0.16497272501841023,
      "group_latin": 0.972,
      "group_latin:std": 0.16497272501841023,
      "score:std": 0.16497272501841023,
      "score": 0.972
    },
    "score": 0.972
  },
  {
    "timestamp": "2024-11-19T07:55:24.578251",
    "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
    "metrics": {
      "en": 0.644,
      "en:std": 0.47881520443695186,
      "group_latin": 0.644,
      "group_latin:std": 0.47881520443695186,
      "score:std": 0.47881520443695186,
      "score": 0.644
    },
    "score": 0.644
  },
  {
    "timestamp": "2024-11-19T07:59:39.163623",
    "model": "Qwen/Qwen2-57B-A14B-Instruct",
    "metrics": {
      "en": 0.904,
      "en:std": 0.2945912422323515,
      "group_latin": 0.904,
      "group_latin:std": 0.2945912422323515,
      "score:std": 0.2945912422323515,
      "score": 0.904
    },
    "score": 0.904
  },
  {
    "timestamp": "2024-11-19T08:01:04.618531",
    "model": "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct",
    "metrics": {
      "en": 0.864,
      "en:std": 0.34278856457005674,
      "group_latin": 0.864,
      "group_latin:std": 0.34278856457005674,
      "score:std": 0.34278856457005674,
      "score": 0.864
    },
    "score": 0.864
  },
  {
    "timestamp": "2024-11-19T08:02:00.608734",
    "model": "neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8",
    "metrics": {
      "en": 0.864,
      "en:std": 0.3427885645700568,
      "group_latin": 0.864,
      "group_latin:std": 0.3427885645700568,
      "score:std": 0.3427885645700568,
      "score": 0.864
    },
    "score": 0.864
  },
  {
    "timestamp": "2024-11-19T08:03:03.806902",
    "model": "neuralmagic/Mistral-7B-Instruct-v0.3-FP8",
    "metrics": {
      "en": 0.54,
      "en:std": 0.4983974317750845,
      "group_latin": 0.54,
      "group_latin:std": 0.4983974317750845,
      "score:std": 0.4983974317750845,
      "score": 0.54
    },
    "score": 0.54
  },
  {
    "timestamp": "2024-11-19T08:05:09.343131",
    "model": "neuralmagic/DeepSeek-Coder-V2-Lite-Instruct-FP8",
    "metrics": {
      "en": 0.856,
      "en:std": 0.35108973211986705,
      "group_latin": 0.856,
      "group_latin:std": 0.35108973211986705,
      "score:std": 0.35108973211986705,
      "score": 0.856
    },
    "score": 0.856
  },
  {
    "timestamp": "2024-11-19T08:05:59.852623",
    "model": "neuralmagic/gemma-2-2b-it-FP8",
    "metrics": {
      "en": 0.608,
      "en:std": 0.48819668167655544,
      "group_latin": 0.608,
      "group_latin:std": 0.48819668167655544,
      "score:std": 0.48819668167655544,
      "score": 0.608
    },
    "score": 0.608
  },
  {
    "timestamp": "2024-11-19T08:09:31.971401",
    "model": "neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8",
    "metrics": {
      "en": 0.964,
      "en:std": 0.18629009635512025,
      "group_latin": 0.964,
      "group_latin:std": 0.18629009635512025,
      "score:std": 0.18629009635512025,
      "score": 0.964
    },
    "score": 0.964
  },
  {
    "timestamp": "2024-11-19T08:12:03.998822",
    "model": "neuralmagic/Mixtral-8x7B-Instruct-v0.1-FP8",
    "metrics": {
      "en": 0.636,
      "en:std": 0.481148625686492,
      "group_latin": 0.636,
      "group_latin:std": 0.481148625686492,
      "score:std": 0.481148625686492,
      "score": 0.636
    },
    "score": 0.636
  },
  {
    "timestamp": "2024-11-19T08:15:08.440710",
    "model": "neuralmagic/Qwen2-72B-Instruct-FP8",
    "metrics": {
      "en": 0.948,
      "en:std": 0.22202702538204672,
      "group_latin": 0.948,
      "group_latin:std": 0.22202702538204672,
      "score:std": 0.22202702538204672,
      "score": 0.948
    },
    "score": 0.948
  },
  {
    "timestamp": "2024-11-19T08:17:36.408135",
    "model": "neuralmagic/Qwen2-57B-A14B-Instruct-FP8",
    "metrics": {
      "en": 0.836,
      "en:std": 0.3702755730533679,
      "group_latin": 0.836,
      "group_latin:std": 0.3702755730533679,
      "score:std": 0.3702755730533679,
      "score": 0.836
    },
    "score": 0.836
  },
  {
    "timestamp": "2024-11-19T08:19:22.844955",
    "model": "neuralmagic/DeepSeek-Coder-V2-Lite-Instruct-FP8",
    "metrics": {
      "en": 0.848,
      "en:std": 0.35902089075707005,
      "group_latin": 0.848,
      "group_latin:std": 0.35902089075707005,
      "score:std": 0.35902089075707005,
      "score": 0.848
    },
    "score": 0.848
  },
  {
    "timestamp": "2024-11-19T08:20:32.181362",
    "model": "hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4",
    "metrics": {
      "en": 0.844,
      "en:std": 0.3628553430776512,
      "group_latin": 0.844,
      "group_latin:std": 0.3628553430776512,
      "score:std": 0.3628553430776512,
      "score": 0.844
    },
    "score": 0.844
  },
  {
    "timestamp": "2024-11-19T08:21:37.190522",
    "model": "hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4",
    "metrics": {
      "en": 0.848,
      "en:std": 0.35902089075707005,
      "group_latin": 0.848,
      "group_latin:std": 0.35902089075707005,
      "score:std": 0.35902089075707005,
      "score": 0.848
    },
    "score": 0.848
  }
]

@merrymercy merrymercy merged commit 7d671e4 into main Nov 20, 2024
13 checks passed
@merrymercy merrymercy deleted the pr-enable-overlap branch November 20, 2024 06:08
@merrymercy merrymercy mentioned this pull request Nov 24, 2024
37 tasks
@zhaochenyang20 zhaochenyang20 mentioned this pull request Mar 3, 2025
22 tasks
timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants