Skip to content

Conversation

ivanium
Copy link
Owner

@ivanium ivanium commented Aug 1, 2024

This PR implements the SP decode kernel:

  • Initialize flashinfer wrapper with actual seq_lens.
  • The kernel support that replicates Q tensors of decoding batch across SP workers, gathers output o,s tensors at the end, and merges their states.
  • Fixed a bug in prefill communication which may lead to deadlock due to incorrect send/recv order
  • Incorporate KV cache store logic. Need out_cache_loc support here.

@ivanium ivanium requested a review from ZYHowell August 1, 2024 23:30
@ZYHowell ZYHowell merged commit 1695aed into main Aug 8, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants