-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Support dmatrix construction from cupy array #5206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
No. But would be nice the have it somewhere in the path. I remember last time someone had a super old numba with cudf and caused a failure. |
@trivialfis I got this working without any imports. It might be possible to do the same with cudf, particularly if they fix the incomplete |
@RAMitchell, could you please raise an issue on the cuDF repo documenting in which ways it is incomplete? |
@jakirkham it was my misunderstanding. I thought the mask attribute was supposed to be a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see inlined comments, I'm not sure if meta info setters are correct.
return interface | ||
|
||
|
||
def _extract_interface_from_cudf(df): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice that you can remove these code.
@@ -614,16 +578,38 @@ def _init_from_dt(self, data, nthread): | |||
nthread)) | |||
self.handle = handle | |||
|
|||
def _init_from_columnar(self, df, missing): | |||
def _init_from_array_interface_columns(self, df, missing, nthread): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using columnar
might be easier for others to google. Not important.
python-package/xgboost/core.py
Outdated
self._init_from_columnar(data, missing) | ||
elif hasattr(data, "__cuda_array_interface__"): | ||
self._init_from_array_interface(data, missing, nthread) | ||
elif isinstance(data, CUDF_DataFrame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have seen usage similar to type(data) == 'cudf.DataFrame'
(Using a string representation of type) before in dask or dask cudf. I was thinking about adopting it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds great if possible.
python-package/xgboost/core.py
Outdated
@@ -862,7 +848,7 @@ def set_group(self, group): | |||
group : array like | |||
Group size of each group | |||
""" | |||
if _use_columnar_initializer(group): | |||
if hasattr(group, "__cuda_array_interface__"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does a cudf dataframe with single column have the array interface? Also does a cudf series have array interface? Can you be sure that the cudf test is using correct initializer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does a cudf dataframe with single column have the array interface?
If it does, then we need to make sure the cudf DataFrame in training data is not initialized with cupy
initializer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cudf with one column does not have array interface. I will modify it to accept this for metainfo.
src/c_api/c_api.cu
Outdated
API_END(); | ||
} | ||
|
||
XGB_DLL int XGDMatrixCreateFromArrayInterface(char const* c_json_strs, | ||
bst_float missing, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong indent.
No worries. Just want to make sure when we are discovering issues they get raised so we can address them 🙂 |
793f535
to
64d349b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Codecov Report
@@ Coverage Diff @@
## master #5206 +/- ##
==========================================
+ Coverage 72.19% 77.38% +5.18%
==========================================
Files 11 11
Lines 2316 2321 +5
==========================================
+ Hits 1672 1796 +124
+ Misses 644 525 -119
Continue to review full report at Codecov.
|
A few things left to do:
@trivialfis I removed the check for array interface version as I found I could get version 0 by installing cupy from conda or 2 by installing from pip, but not 1. Does this break anything for cudf?