-
Notifications
You must be signed in to change notification settings - Fork 567
Open
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
Welcome aboard!!
We’re building a vibrant open-source community and your ideas and code are what make it thrive.
To help you get started, we will continually update this page with a collection of bite-sized “quick start” tasks—dive in and let’s create something amazing together!
Good First Issues:
Documentation: good documentation is what makes LMCache easy to use and adoptable!
- Doc for Infinistore: https://docs.lmcache.ai/kv_cache/storage_backends/infinistore.html
- Doc for Remote Backend with FSConnector (link does not exist yet)
- Improved Docs for GDS mounting and kernel module installation (https://docs.lmcache.ai/kv_cache/storage_backends/gds.html), this issue [question] about GDS backend #1160 is a good direction leading as well
- Cache Controller Related Documentation ([Help Wanted] [Doc] Update controller-related documentation #1218)
- Compatibility Matrix between torch versions, vllm versions, and LMCache versions, preferrably here: https://docs.lmcache.ai/getting_started/installation.html
- Updated comprehensive description of all LMCache configuration parameters (https://docs.lmcache.ai/api_reference/configurations.html)
- Docs for the layerwise pipelining and prefetching (please place into the section called "KV Cache Optimizations")
- Docs for tagging feature of KV Cache offloading and sharing ([DOCS] Add doc of tags feature to KV Cache offloading and sharing section #1322)
- changed to general purpose lmcache request configs insite of kv_transfer_params: [refact] use request_configs replace tags #1377 - Docs for setting store cache to false (https://github.com/LMCache/LMCache/tree/dev/examples/cache_interface)
- Docs for Cache Engine Internal API Server
- how to configure: https://github.com/LMCache/LMCache/pull/1387/files
- thread stack trace observability Support show thread info within cache engine internal api server #1358
- dynamic lmcache log level adjustment Support Get or set log level via internal api server #1359 - Docs for Eviction Algorithms
- LRU
- S3 FIFO [Feature] Add S3FIFO cache policy #1341 - Docs for compatibility with different distributed executors (Ray) and compatibility with parallelisms:
- Workaround for issue #1346 #1356
Bugs:
- Decoder memory leak in Prefill Decode Disaggregation
- [random]1P1D ,the decoder will get stuck #1258
- prefiller non-stopping printing "Failed to allocate memory object, retrying..." and request stucked #1337
- [Bug][xPyD][lmcache0.3.3+vllm0.10.0] "failed to allocate memory for tensor" during benchmark with lmcache xPyD version #1339 - Prefiller memory leak in Prefill Decode Disaggregation (prefiller non-stopping printing "Failed to allocate memory object, retrying..." and request stucked #1337)
- Graceful handling of when pinned CPU RAM is full (Crash when RAM offload is full #1330)
- mem leak with lm server LMCache WARNING: Ref count of MemoryObj -1is negative: -2.Double free occurred somewhere.Setting ref count back to 0 as a hack but please find the bug. (memory_management.py:324:lmcache.v1.memory_management) #1365
- KV Cache Peresistence in SSD (KV-Cache Persistence Issue with SSD Offloading #1175)
- Layerwise small bugs to be ironed out
- [bug] Layerwise mode retrive error #1150
Testing + CI/CD
See #933 for how to run the unit tests
- Unit Tests for Layerwise KV Transfer
- Unit Tests for Prefill Decode Disaggregation
- [BUG] Python packages missing for unit tests #696 - Unit Tests for CacheBlend V1
- Unit Tests for Cache Controller
Performance / Profiling / Workloads
If LMCache is not performant, profiling (https://docs.vllm.ai/en/v0.9.1/contributing/profiling.html) and detailed description of deployments and workload would be great!
- Mooncake store [Performance] Mooncake Store did not achieve the expected performance on a single node 4090! #1331
- Slower inference with large system prompt #1316
- Amazing work using profiling to find the movement of slot mapping to GPU is taking a long time!([Performance] Unexpected high latency in slot_mapping = slot_mapping.cuda() #1303)
- Different models with different prefill speeds, kv cache size per token, and prefix caching buffer size perform differently discussion (Does the number of kvcache caches supported by GPU memory affect the performance of lmcache? #1241)
Features / Improvements
- Layerwise + PD Compatibility ([Bug] - When layerwise is enabled only for decoder (PD Aggregation), start_load_kv() in class LMCacheConnectorV1Impl does not work properly. #1174)
- Compatibility with vLLM ray distributed executor (Initialization of LMCache Connector within non-GPU vLLM Engine #1167 and LMCache seems to be using the wrong CUDA devices with Ray+PP #1346)
- HPU Support ([Hardware] Enable Intel Gaudi (HPU) support #1066) [WIP]
- Support for EAGLE / MEDUSA (VLLM_V1 EAGLE3 breaks lmcache input array broadcasting #967 , [bug] eagle speculative decoding not working with vllm #1153)
- CacheBlend support for Models beyond Llama
- [Feature request] CacheBlend support for DeepSeek models #1082 (DeepSeek / MLA)
- CacheBlend for Qwen3 #1121 (Qwen 3)
- online blend (not just offline) Blend v1 for online serving and benchmark #1136
- online blend and more models vllm model for vllm-instance not found #1405 - More eviction policies :) [Enhancement] Support for more storage backend‘s cache policies? #1306
- Valkey backend RDMA ([Usage] Does LMCache support the RDMA feature of valkey? #1222)
RFCs (Discussions on the future directions of LMCache and how to make progress)
- Non-Python (C++/Rust) Backends! ([RFC]: Non-Python backends #1362)
- Async Start Load KV in non layerwise case ([RFC] support async start_load_kv in non_layerwise case #1033)
- Abstraction layer for more serving engines beyond vLLM and SGLang ([RFC] Refactor to adapt multiply Inference engine #1171)
- Add support for LMStudio and Ollama Add support for lmstudio and ollama #923 - StreamingLLM support ([RFC] StreamingLLM support #1047)
- TTFT aware routing [Feature] Support the lookup of the cached prefix info of every instance #1050
- Flexibility for package installations (Need more flexibility for package installation #1062 and [Fix]: Pin torch version again #1274) <- need to make LMCache versioning of dependencies like torch flexible ([Doc][Bug] LMCache's compatibility with CUDA 12.9 #1155 CUDA versioning)
- Optimizing the Redis Backend ([RFC]: Optimizing Redis Backend #1239)
- Web UI for the cache controller ([FeatureWanted][WebUI] Extend Controller with webui to act as a frontend of LMCache #1302)
- Dynamically configure/observe LMCache through a controller / api server (API Server commit: Lmcache internal api server for metrics export #1318 (comment)
- dynamci configuration [Controller] Dynamic update config through controller #1265 (comment)) - Completely modularize design of LMCache backends Make backend easy to be extended and dynamic imported by refactor to be modular #1373
Older Issues
- [Onboard][CI] Add unit tests for fs_connector #620
- [Onboard][Bug] Prometheus metrics not reported in vLLM v1 #621
- [Benchmark] Benchmark Scripts for LMCache under different workloads #654
- [Misc] Fix problem in the docker example script #682
- [Bug] kv cache caculator wrong results #688
- [Core] GPU connector performance improvement #693
- Support username and password auth for Redis backends #667
More tasks are coming.
vivamilk, IRONICBo, Hanchenli, wwl2755, ApostaC and 9 more
Metadata
Metadata
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed