Skip to content

Conversation

SahilJain314
Copy link
Contributor

@SahilJain314 SahilJain314 commented Mar 23, 2025

  • improved offloading by doing bf16 weight cast on IPC weights then offloading the full model
  • MP with FSDP MP + autocast MP
  • also improved startup speed by having vllm init with dummy weights then refitting.
  • changed default configs from "technically runs" to "will actually improve model"

#13

Signed-off-by: Sahil Jain <sahilj@nvidia.com>
parthchadha
parthchadha previously approved these changes Mar 23, 2025
@SahilJain314 SahilJain314 changed the title Multiprocessing memory improvements and better default configs (converge-able) Mixed Prec memory improvements and better default configs (converge-able) Mar 23, 2025
trying a different approach

Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
@parthchadha parthchadha changed the title Mixed Prec memory improvements and better default configs (converge-able) fix: Mixed Prec memory improvements and better default configs (converge-able) Mar 24, 2025
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
@SahilJain314 SahilJain314 merged commit bd7e4b0 into main Mar 25, 2025
13 checks passed
@SahilJain314 SahilJain314 deleted the sahilj/mp_fix branch March 25, 2025 07:08
KiddoZhu pushed a commit that referenced this pull request May 6, 2025
…rge-able) (#32)

Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Co-authored-by: Terry Kong <terryk@nvidia.com>
Co-authored-by: Parth Chadha <pchadha@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants