Skip to content

Conversation

DarkSharpness
Copy link
Collaborator

Motivation

This PR is still part of #7194.

Modifications

  • Upstream Hierarchical Radix Tree in C++:
    Added the core radix tree implementation for future hierarchical caching.
  • Python Interface (Non-Hierarchical Only):
    Implemented Python bindings for the radix tree, currently supporting only non-hierarchical caching. Hierarchical cache integration will be addressed in subsequent PRs.
  • Experimental Feature Flag:
    The experimental C++ radix tree can be enabled by setting the environment variable SGLANG_EXPERIMENTAL_HIRADIX_CACHE=1. Default behavior remains unchanged.

Checklist

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @DarkSharpness, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new, high-performance C++ Radix Tree implementation designed to serve as a core component for advanced memory caching, particularly for future hierarchical caching strategies. It includes the necessary Python bindings to expose this functionality, which can be activated on an experimental basis, and updates the build configuration to support the new C++ codebase.

Highlights

  • Core C++ Radix Tree Implementation: I've introduced a new C++-based Radix Tree (RadixTree) with foundational support for hierarchical caching concepts, including device (GPU) and host (CPU) memory management, eviction policies, and reference counting. This implementation lays the groundwork for future hierarchical caching features.
  • Python Bindings: I've provided Python bindings for the C++ Radix Tree (HiRadixCacheCpp) using pybind11. This allows Python code to interact with the high-performance C++ implementation. Currently, the Python interface primarily supports non-hierarchical caching, with full hierarchical integration planned for subsequent pull requests.
  • Experimental Feature Flag: The new C++ Radix Tree has been integrated into the scheduler as an experimental feature. Users can enable it by setting the environment variable SGLANG_EXPERIMENTAL_HIRADIX_CACHE=1. This allows for controlled rollout and testing.
  • Build System Integration: I've updated the CMake build system to compile the new C++ Radix Tree sources and generate the necessary Python extension module. This ensures seamless integration of the new C++ components into the project's build process.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a C++ Radix Tree implementation with Python bindings, primarily for non-hierarchical caching. The C++ code is well-structured and uses modern features. The Python interface is clear.

The most critical missing piece is unit testing, which is essential for such a complex data structure. Documentation updates and benchmark results are also noted as pending in the PR description.

A few minor points regarding parameter naming consistency and assertions have been raised. The NotImplementedError for hierarchical cache support in the C++ backend is aligned with the PR's stated scope.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@KevinZeng08
Copy link

Great work!
I would like to ask how much the performance of the C++ version of RadixTree can be expected to improve?

@DarkSharpness
Copy link
Collaborator Author

Great work! I would like to ask how much the performance of the C++ version of RadixTree can be expected to improve?

Thank you for your question! Based on my testing, the C++ implementation typically delivers 2-3× speedups over the Python version. In extreme cases like #6907, this should significantly reduce evict operation overhead.

Note that:

  1. This PR is part of [Refactor] Rewrite Hierarchical Prefix Cache in C++ #7194, which is not meant to replace the original radix cache in Python. Its ultimate goal is to accelerate some advanced prefix-matching related scheduling when hierarchical cache is enabled.
  2. The C++ implementation is actually 10 times faster than Python Radix Cache when tested alone. But in production, some other operations like torch.cat will amortize the speedup.
  3. Current Python implementation works fine for most workloads.

Copy link
Contributor

@merrymercy merrymercy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. move this to JIT https://docs.pytorch.org/tutorials/advanced/cpp_extension.html#jit-compiling-extensions. Only jit/import when it is really used. Open a new folder python/sglang/srt/mem_cache/cpp_radix_tree
  2. Add an e2e test case test/srt/test_cpp_radix_tree.py. Follow this
    class TestPageSize(CustomTestCase):
    • Test e2e accuracy
    • Test cache hit rate.

@DarkSharpness
Copy link
Collaborator Author

All fixed. cc @merrymercy @xiezhq-hermann

@merrymercy merrymercy merged commit e273aa6 into sgl-project:main Aug 3, 2025
57 of 62 checks passed
lifuhuang pushed a commit that referenced this pull request Aug 3, 2025
htiennv pushed a commit to htiennv/sglang that referenced this pull request Aug 5, 2025
ShangmingCai pushed a commit that referenced this pull request Aug 5, 2025
ShangmingCai pushed a commit that referenced this pull request Aug 5, 2025
@DarkSharpness DarkSharpness deleted the dev_radix_cpp_impl branch August 10, 2025 22:39
narutolhy pushed a commit to narutolhy/sglang that referenced this pull request Aug 17, 2025
narutolhy pushed a commit to narutolhy/sglang that referenced this pull request Aug 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants