NaN values and Scikit-Learn RFECV

I couldn't find the issue associated with [this XGBoost forum topic](https://discuss.xgboost.ai/t/nan-values-and-scikit-learn-rfecv/1432), so I assume none was created. I can confirm this problem persists with the latest nightly of xgboost (`a38e7bd19c461e0bed7bd96ec72d56132157d4af`) and scikit-learn (`018c6dc57d21c89c7d1278c686c7d5d62f32ee48`).

I agree with Mike Creeth's statement in the previously mentioned forum post:

> I believe this is because RFECV does some checking based on the tags that it gets from the estimator. It uses the tag ‘allow_nan’ to determine whether or not to check X for NaN values. It seems that currently XGBoost simply inherits the default “allow_nan” tag value from the scikit-learn estimator class, which is False. As XGB does in fact handle null values in X, I believe this behavior is incorrect.

```
from xgboost import XGBClassifier
from sklearn.feature_selection import RFECV

estimator = XGBClassifier()
selector = RFECV(estimator, cv=3)
selector = selector.fit(X, y)
```

with `X` having one or more `np.nan` values, raises the following error:

```
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-35-6d04ec0892c9> in <module>
     18 
     19 selector = RFECV(model, cv=3)#, scoring=neg_rmse)
---> 20 selector = selector.fit(X_train.values, y_train.values)

/local/burghbvander/miniconda3/envs/fastai/lib/python3.7/site-packages/sklearn/feature_selection/_rfe.py in fit(self, X, y, groups)
    498             X, y, accept_sparse="csr", ensure_min_features=2,
    499             force_all_finite=not tags.get('allow_nan', True),
--> 500             multi_output=True
    501         )
    502 

/local/burghbvander/miniconda3/envs/fastai/lib/python3.7/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, **check_params)
    404             out = X
    405         else:
--> 406             X, y = check_X_y(X, y, **check_params)
    407             out = X, y
    408 

/local/burghbvander/miniconda3/envs/fastai/lib/python3.7/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
    724                     ensure_min_samples=ensure_min_samples,
    725                     ensure_min_features=ensure_min_features,
--> 726                     estimator=estimator)
    727     if multi_output:
    728         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,

/local/burghbvander/miniconda3/envs/fastai/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    571         if force_all_finite:
    572             _assert_all_finite(array,
--> 573                                allow_nan=force_all_finite == 'allow-nan')
    574 
    575     if ensure_min_samples > 0:

/local/burghbvander/miniconda3/envs/fastai/lib/python3.7/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
     60                     msg_err.format
     61                     (type_err,
---> 62                      msg_dtype if msg_dtype is not None else X.dtype)
     63             )
     64     # for object dtype data, we only check for NaNs (GH-13254)

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
```

The `allow_nan` tag should probably be set to `True` in `XGBClassifier`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

NaN values and Scikit-Learn RFECV #5401

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

NaN values and Scikit-Learn RFECV #5401

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions