Skip to content

Request for Multithreading Support in CatBoost Pool Dataset Construction #2542

@RunxingZhong

Description

@RunxingZhong

Hello,

I've noticed that CatBoost's Pool class uses only a single thread during the construction of datasets. I was wondering if there's any possibility to support multithreading in this process?

The reason for this request is that, when dealing with large datasets, I find that the time taken to build the Pool is actually longer than the time it takes for training the model. This seems inefficient and likely not the intended behavior.

Implementing multithreading could significantly reduce the dataset construction time, leading to a more streamlined and efficient workflow.

Thank you for considering this enhancement.

Best regards

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions