Skip to content

Conversation

aoshen524
Copy link
Contributor

@aoshen524 aoshen524 commented Mar 9, 2025

Motivation

#3414 reports issues regarding limited model support compared to test_generation_models.py. This PR introduces tensor parallelism and weight slicing for LoRA, alongside additional improvements to testing and functionality.

Modifications

  • Implemented tensor parallelism support in LoRA, allowing efficient distribution of computations across multiple devices.
  • Introduced LoRA weight slicing and refactor memory pool to facilitate distributed inference, optimizing memory usage and performance.

checklist:

  • Remove tensor.contiguous() used in GPU

@Fridge003 Fridge003 changed the title Feature/lora [Feature] Support Tensor Parallelism and Weight Slicing for Lora Mar 9, 2025
merrymercy and others added 22 commits March 11, 2025 04:05
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: HandH1998 <1335248067@qq.com>
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
Co-authored-by: yinfan98 <1106310035@qq.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.