Skip to content

Conversation

ariG23498
Copy link
Collaborator

We have used unsloth/Llama-3.2-3B and applied Layer Skip SFT on it with the https://huggingface.co/datasets/WillHeld/top_v2 dataset. You can find the fine tuned model here.

Benchmark Results:

Running the throughput benchmark script we get the following:

[------ Generation Speeds -------]
                     |  generation
16 threads: ----------------------
      no layer skip  |    522.1   
      layer skip 1   |    568.9   
      layer skip 2   |    353.0   
      layer skip 3   |    188.3   
      layer skip 4   |    170.6   
      layer skip 5   |    186.7   
      layer skip 6   |    203.2   
      layer skip 7   |    219.0   
      layer skip 8   |    235.3   
      layer skip 9   |    251.1   
      layer skip 10  |    260.7   
      layer skip 11  |    276.1   
      layer skip 12  |    291.9   
      layer skip 13  |    307.8   
      layer skip 14  |    323.1   
      layer skip 15  |    338.9   

Times are in milliseconds (ms).

With Layer number 4 we get around 67% reduction in generation latency.

We would love for the trl team to give us feedback on the technique. Any help or guidance would be appreciated.

CC: @mostafaelhoushi

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@mostafaelhoushi mostafaelhoushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Aritra! I just added some minor comments.

Co-authored-by: Mostafa Elhoushi <m.elhoushi@ieee.org>
Copy link
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for adding in the research projects!

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
@ariG23498
Copy link
Collaborator Author

Hi @qgallouedec are we missing out on something? I am not sure if the CI errors are due to the changes, should I look into it more?

@qgallouedec qgallouedec changed the title [Research] Layer Skip SFT 🐇 [Research] Layer Skip SFT Mar 24, 2025
@qgallouedec
Copy link
Member

Merging, sorry for the delay

@qgallouedec qgallouedec merged commit bfe2075 into huggingface:main Mar 24, 2025
8 of 13 checks passed
toslali-ibm pushed a commit to toslali-ibm/trl that referenced this pull request Mar 25, 2025
Co-authored-by: Mostafa Elhoushi <m.elhoushi@ieee.org>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
kashif pushed a commit to kashif/trl that referenced this pull request Mar 28, 2025
Co-authored-by: Mostafa Elhoushi <m.elhoushi@ieee.org>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
yxliu-TAMU pushed a commit to mincheolseong/ECEN743-GRPO-Project-Proposal that referenced this pull request Apr 20, 2025
Co-authored-by: Mostafa Elhoushi <m.elhoushi@ieee.org>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants