[Feature] Support Tensor Parallelism and Weight Slicing for Lora #4239

aoshen524 · 2025-03-09T22:13:53Z

Motivation

#3414 reports issues regarding limited model support compared to test_generation_models.py. This PR introduces tensor parallelism and weight slicing for LoRA, alongside additional improvements to testing and functionality.

Modifications

Implemented tensor parallelism support in LoRA, allowing efficient distribution of computations across multiple devices.
Introduced LoRA weight slicing and refactor memory pool to facilitate distributed inference, optimizing memory usage and performance.

checklist:

Remove tensor.contiguous() used in GPU

… QKVParallelLinear.

…l-project#3652) Co-authored-by: ShenAo1111 <1377693092@qq.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>

Co-authored-by: yqtianust <yqtian@ust.hk>

…#3680)

Co-authored-by: sleepcoo <sleepcoo@gmail.com> Co-authored-by: HandH1998 <1335248067@qq.com> Co-authored-by: shuaills <shishuaiuoe@gmail.com> Co-authored-by: yinfan98 <1106310035@qq.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>

…e card (sgl-project#3958)

Co-authored-by: Chayenne <zhaochen20@outlook.com>

…t#4252)

sgl-project#4231)

… branch.

zhaochenyang20 and others added 13 commits February 19, 2025 16:01

Merge branch 'main' into feature/lora

a38af6e

test(srt): update run_suite.py with new lora folder.

376ab01

test(srt): update LoRA test tolerances and configurations.

2815f90

test(srt): optimize LoRA test code with pre-commit.

cb480e6

Merge branch 'main' into feature/lora

9d89a52

Merge branch 'main' into feature/lora

c0b2055

test(models): adjust accuracy threshold for Qwen models.

d8d0f78

Merge branch 'main' into feature/lora

fd79616

Merge branch 'main' into feature/lora

d75a5d5

feat(srt): add q_proj_shard_size and kv_proj_shard_size attributes to…

8ba82dd

… QKVParallelLinear.

feat(lora): add support for tensor parallelism in LoRA

ff9fd13

feat(srt): add LoRA weight slicing for distributed training

2f5739f

refactor(lora): remove unused GPU offloading functionality

fe924b5

aoshen524 requested review from merrymercy, Ying1123, zhyncs, hnyls2002, ispobock and HaiShaw as code owners March 9, 2025 22:13

Fridge003 changed the title ~~Feature/lora~~ [Feature] Support Tensor Parallelism and Weight Slicing for Lora Mar 9, 2025

This was referenced Mar 9, 2025

[Feature] Test case enhancement for Lora features #3414

Closed

[Roadmap] Lora Support #2929

Open

ShenAo1111 and others added 8 commits March 10, 2025 22:10

remove[lora]: remove all notations wriiten in Chinese.

f98b14c

[Fix] Fix bugs and refactor codes in lora for better scalability. (sg…

4c1ee9e

…l-project#3652) Co-authored-by: ShenAo1111 <1377693092@qq.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>

docs: fix 404 link (sgl-project#3588)

e9f3ecd

Co-authored-by: yqtianust <yqtian@ust.hk>

[docs] added torch.compile cache to dpsk manual (sgl-project#3737)

fd9e2c9

AMD/ROCm: update AITER repo to ROCm/aiter (sgl-project#3747)

0a59eef

feat: update grouped_topk to support softmax and sigmoid (sgl-project…

1de3709

…#3680)

feat: Add SageMaker support (sgl-project#3740)

f311608

Change description of nvidia jetson docs (sgl-project#3761)

f70ea18

merrymercy and others added 22 commits March 11, 2025 04:05

Minor style fix for sgl-kernel (sgl-project#4243)

a43e55a

Auto balance CI tests (sgl-project#4238)

98decee

Clean up fp8 support (sgl-project#4230)

5730180

Move activation.cu to sgl-kernel/elementwise (sgl-project#4250)

110b0ad

DeepGemm integrate to sgl-kernel (sgl-project#4165)

69583c9

Co-authored-by: sleepcoo <sleepcoo@gmail.com> Co-authored-by: HandH1998 <1335248067@qq.com> Co-authored-by: shuaills <shishuaiuoe@gmail.com> Co-authored-by: yinfan98 <1106310035@qq.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>

[Bug fixed] fixed the crash when enable the dp-attention on the singl…

122a4b3

…e card (sgl-project#3958)

Added example for multimodal embedding (sgl-project#4206)

d6666a6

Co-authored-by: Chayenne <zhaochen20@outlook.com>

Simplify tests & Fix trtllm custom allreduce registration (sgl-projec…

bd58fd8

…t#4252)

fix the input_ids is None error (sgl-project#4144)

4bf7b08

fix per_token_group_quant_fp8 illegal memory when num_groups % 16 != 0 (

5b7a02a

sgl-project#4231)

Release sgl-kernel v0.0.4.post1 (sgl-project#4255)

9413891

Fix quantization and nightly tests (sgl-project#4258)

95145b6

increase the timeout of nightly-test.yml (sgl-project#4262)

c3b8b97

Optimize rope in sgl kernel (sgl-project#4267)

c2f851e

Test no vllm custom allreduce (sgl-project#4256)

80ba9b3

Amd test fp8 (sgl-project#4261)

28e811d

add THIRDPARTYNOTICES for DeepGEMM (sgl-project#4272)

2c5b08e

fix(lora): improve handling of gate projections in LoRA weights.

6f1d7d5

style(lora): format LoRA codebase with pre-commit tool.

911b334

perf(sglang): adjust static memory fraction to original value in main…

15ba985

… branch.

test(srt): update LoRA test tolerances and configurations.

57bc3c7

feat(lora): add support for tensor parallelism in LoRA

dc9ad0a

aoshen524 requested review from HandH1998, BBuf, yizhang2077, ByronHsu, rkooo567, kssteven418 and zhaochenyang20 as code owners March 10, 2025 20:06

Fridge003 closed this Mar 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Support Tensor Parallelism and Weight Slicing for Lora #4239

[Feature] Support Tensor Parallelism and Weight Slicing for Lora #4239

Uh oh!

aoshen524 commented Mar 9, 2025 •

edited by Fridge003

Loading

Uh oh!

Uh oh!

[Feature] Support Tensor Parallelism and Weight Slicing for Lora #4239

[Feature] Support Tensor Parallelism and Weight Slicing for Lora #4239

Uh oh!

Conversation

aoshen524 commented Mar 9, 2025 • edited by Fridge003 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Uh oh!

Uh oh!

aoshen524 commented Mar 9, 2025 •

edited by Fridge003

Loading