-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Description
Hello everyone! 👋 I'm excited to share what we have in plan for Q3 2025 for Ray. I will try to keep this updated as features get merged in, and rolled out.
Goal: Deliver foundational reliability, performance, and DX improvements across Ray Core, Data, Train, LLM, Serve, RL, Observability, Technical Content, and KubeRay.
Ray Core
Reliability & Fault Tolerance
- Improve system stability under node and network failures,
- Including making RPCs tolerant to transient errors
- Add robust support for preemptible instances
Scheduling & Performance
- Introduce label-based scheduling for finer-grained resource control ([Core] Ray Label Selector API Implementation Tracker #51564)
- Implement GPU objects with RDMA transfer support for high-performance GPU data handling ([Core] verl gpu tensor rdma integration #54943)
Developer Experience
- Introduce ActorMesh for simplified interaction with groups of actors ([WIP] [Prototype] ActorMesh API #54760)
- Improve static typing across the codebase to enhance developer productivity
- Address outstanding technical debt in core worker components
Ecosystem Integrations
- Provide official support for reinforcement learning libraries like veRL, OpenRLHF, and ROLL ([RFC] Improving Ray for Post-Training / RL for LLM Projects #54021)
Ray Data
Reliability
- Ensure workloads complete successfully despite cluster failures
Performance
- Enhance training ingest pipelines with advanced sampling and caching support
Connectors
- Improve Apache Iceberg integration
- Expand data catalog support, starting with Databricks Unity Catalog
Usability
- Schema UDFs
- Enhanced internal query planning
Ray Train
API
- Finalize Train v2 API
Performance
- Implement asynchronous checkpointing
LLM
Goal: Run large models (e.g. DeepSeek) at scale via vLLM on Ray Serve:
- Prefill diaggregation
- Large scale DP
- Custom request routing
- Elastic expert parallelism
Performance & Efficiency
- Implement prefill disaggregation to optimize performance for large context models.
- Develop an intelligent, KV cache-aware router with a pluggable architecture
- Implement Distributed Parallel (DP) Attention within Ray Serve
Operations
- Publish updated performance benchmarks
Ecosystem
- Support SkyRL for reinforcement learning for human feedback (RLHF) workloads
Ray Serve
Serving Flexibility
- Custom auto‑scaling and routing patterns
- Async inference support
- MCP server patterns
- Integrate label based scheduling
Observability
- Enhanced tracing support
RLlib
- Ray RL V2 stack GA
- Algorithm composability enhancements
Observability
API Release
- Public launch of unified event export API
Optimization
- Refactor internals to leverage new export API
Technical Content
- New technical templates
- More examples & deep‑dives
KubeRay
Upgrades
- Productionize the incremental upgrade feature for seamless cluster updates
Hardware Support
- Streamline support for diverse accelerators, including multiple GPU types, Dynamic Resource Allocation (DRA), and MIG
Autoscaling
- Continue to improve the functionality and reliability of Autoscaler V2
We love hearing from the community! If there is a feature you'd like to see in Ray in the future, let us know by filing a feature request or comment here. Thank you for supporting Ray!