Skip to content

[Roadmap] DMatrix refactoring #5143

@RAMitchell

Description

@RAMitchell

This issue is a roadmap and checklist for ongoing work re-factoring DMatrix (see RFC #4354).

The first steps are to use a common interface to external data, unifying the way DMatrix objects are constructed and simplifying the process of adding new external data sources.

After the above all DMatrix constructors will be happening via adapters, missing value handling and use of threads will be consistent.

Then I plan to start reducing the number of classes associated with DMatrix.

The final goal is to save memory by constructing histogram matrices for the hist and gpu_hist algorithms directly from external data using adapters. We will need some discussion on the interface e.g. if a user wants to build the histogram DMatrix directly, specify an enum to the constructor indicating DMatrix type

  • Develop interface for instantiating histogram DMatrix directly
  • Build constructor for EllPack matrix directly using adapters

Lastly:

  • Enable weighted sketching for DeviceQuantileDMatrix.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions