Skip to content

Conversation

RAMitchell
Copy link
Member

@RAMitchell RAMitchell commented Jan 15, 2020

A few things left to do:

  • Correctly construct labels/info
  • Add an integration test for training
  • Mention cupy in documentation

@trivialfis I removed the check for array interface version as I found I could get version 0 by installing cupy from conda or 2 by installing from pip, but not 1. Does this break anything for cudf?

@trivialfis
Copy link
Member

trivialfis commented Jan 15, 2020

No. But would be nice the have it somewhere in the path. I remember last time someone had a super old numba with cudf and caused a failure.

@RAMitchell RAMitchell marked this pull request as ready for review January 19, 2020 22:39
@RAMitchell
Copy link
Member Author

@trivialfis I got this working without any imports. It might be possible to do the same with cudf, particularly if they fix the incomplete __cuda_array_interface__ attribute.

@jakirkham
Copy link
Contributor

It might be possible to do the same with cudf, particularly if they fix the incomplete __cuda_array_interface__ attribute.

@RAMitchell, could you please raise an issue on the cuDF repo documenting in which ways it is incomplete?

@RAMitchell
Copy link
Member Author

@jakirkham it was my misunderstanding. I thought the mask attribute was supposed to be a __cuda_array_interface__ , looking closer at the standard it should be an object that has the attribute __cuda_array_interface__.

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see inlined comments, I'm not sure if meta info setters are correct.

return interface


def _extract_interface_from_cudf(df):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that you can remove these code.

@@ -614,16 +578,38 @@ def _init_from_dt(self, data, nthread):
nthread))
self.handle = handle

def _init_from_columnar(self, df, missing):
def _init_from_array_interface_columns(self, df, missing, nthread):
Copy link
Member

@trivialfis trivialfis Jan 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using columnar might be easier for others to google. Not important.

self._init_from_columnar(data, missing)
elif hasattr(data, "__cuda_array_interface__"):
self._init_from_array_interface(data, missing, nthread)
elif isinstance(data, CUDF_DataFrame):
Copy link
Member

@trivialfis trivialfis Jan 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen usage similar to type(data) == 'cudf.DataFrame' (Using a string representation of type) before in dask or dask cudf. I was thinking about adopting it here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds great if possible.

@@ -862,7 +848,7 @@ def set_group(self, group):
group : array like
Group size of each group
"""
if _use_columnar_initializer(group):
if hasattr(group, "__cuda_array_interface__"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a cudf dataframe with single column have the array interface? Also does a cudf series have array interface? Can you be sure that the cudf test is using correct initializer?

Copy link
Member

@trivialfis trivialfis Jan 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a cudf dataframe with single column have the array interface?

If it does, then we need to make sure the cudf DataFrame in training data is not initialized with cupy initializer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cudf with one column does not have array interface. I will modify it to accept this for metainfo.

API_END();
}

XGB_DLL int XGDMatrixCreateFromArrayInterface(char const* c_json_strs,
bst_float missing,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong indent.

@jakirkham
Copy link
Contributor

it was my misunderstanding. I thought the mask attribute was supposed to be a __cuda_array_interface__ , looking closer at the standard it should be an object that has the attribute __cuda_array_interface__.

No worries. Just want to make sure when we are discovering issues they get raised so we can address them 🙂

@RAMitchell RAMitchell force-pushed the cupy branch 2 times, most recently from 793f535 to 64d349b Compare January 21, 2020 02:55
Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@RAMitchell RAMitchell merged commit 9c56480 into dmlc:master Jan 22, 2020
@codecov-io
Copy link

Codecov Report

Merging #5206 into master will increase coverage by 5.18%.
The diff coverage is 25%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5206      +/-   ##
==========================================
+ Coverage   72.19%   77.38%   +5.18%     
==========================================
  Files          11       11              
  Lines        2316     2321       +5     
==========================================
+ Hits         1672     1796     +124     
+ Misses        644      525     -119
Impacted Files Coverage Δ
python-package/xgboost/compat.py 48.78% <ø> (-4.07%) ⬇️
python-package/xgboost/core.py 79.07% <25%> (-0.77%) ⬇️
python-package/xgboost/dask.py 61.87% <0%> (+43.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0184f2e...0324663. Read the comment docs.

@RAMitchell RAMitchell mentioned this pull request Feb 17, 2020
10 tasks
@lock lock bot locked as resolved and limited conversation to collaborators Apr 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants