-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[Feature] Radix Tree in C++ #7369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Radix Tree in C++ #7369
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @DarkSharpness, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces a new, high-performance C++ Radix Tree implementation designed to serve as a core component for advanced memory caching, particularly for future hierarchical caching strategies. It includes the necessary Python bindings to expose this functionality, which can be activated on an experimental basis, and updates the build configuration to support the new C++ codebase.
Highlights
- Core C++ Radix Tree Implementation: I've introduced a new C++-based Radix Tree (
RadixTree
) with foundational support for hierarchical caching concepts, including device (GPU) and host (CPU) memory management, eviction policies, and reference counting. This implementation lays the groundwork for future hierarchical caching features. - Python Bindings: I've provided Python bindings for the C++ Radix Tree (
HiRadixCacheCpp
) usingpybind11
. This allows Python code to interact with the high-performance C++ implementation. Currently, the Python interface primarily supports non-hierarchical caching, with full hierarchical integration planned for subsequent pull requests. - Experimental Feature Flag: The new C++ Radix Tree has been integrated into the scheduler as an experimental feature. Users can enable it by setting the environment variable
SGLANG_EXPERIMENTAL_HIRADIX_CACHE=1
. This allows for controlled rollout and testing. - Build System Integration: I've updated the CMake build system to compile the new C++ Radix Tree sources and generate the necessary Python extension module. This ensures seamless integration of the new C++ components into the project's build process.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a C++ Radix Tree implementation with Python bindings, primarily for non-hierarchical caching. The C++ code is well-structured and uses modern features. The Python interface is clear.
The most critical missing piece is unit testing, which is essential for such a complex data structure. Documentation updates and benchmark results are also noted as pending in the PR description.
A few minor points regarding parameter naming consistency and assertions have been raised. The NotImplementedError
for hierarchical cache support in the C++ backend is aligned with the PR's stated scope.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Great work! |
Thank you for your question! Based on my testing, the C++ implementation typically delivers 2-3× speedups over the Python version. In extreme cases like #6907, this should significantly reduce Note that:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- move this to JIT https://docs.pytorch.org/tutorials/advanced/cpp_extension.html#jit-compiling-extensions. Only jit/import when it is really used. Open a new folder
python/sglang/srt/mem_cache/cpp_radix_tree
- Add an e2e test case
test/srt/test_cpp_radix_tree.py
. Follow thissglang/test/srt/test_page_size.py
Line 16 in 8364608
class TestPageSize(CustomTestCase): - Test e2e accuracy
- Test cache hit rate.
3a4a52f
to
b3a509b
Compare
All fixed. cc @merrymercy @xiezhq-hermann |
Motivation
This PR is still part of #7194.
Modifications
Added the core radix tree implementation for future hierarchical caching.
Implemented Python bindings for the radix tree, currently supporting only non-hierarchical caching. Hierarchical cache integration will be addressed in subsequent PRs.
The experimental C++ radix tree can be enabled by setting the environment variable
SGLANG_EXPERIMENTAL_HIRADIX_CACHE=1
. Default behavior remains unchanged.Checklist