Training fails when using external memory version for large datasets with instance weights

I am attempting to train an XGBoost model using a large dataset that I cannot completely load in memory. So I decided to use the external memory feature of XGBoost training, like so:

    dtrain = xgb.DMatrix(f"data/train.libsvm#train.cache", feature_names=feature_names)

Now I also need to be able to specify instance weights, so I tried to do that by specifying the weights in a separate `train.libsvm.weight` file:

```
$ head -5 data/train.libsvm.weight
5.28486226776928e-7
5.28486226776928e-7
5.28486226776928e-7
5.28486226776928e-7
5.28486226776928e-7
```

However, training fails with the following error:

```
[22:15:43] 11252555x174 matrix with 44859377 entries loaded from data/train.libsvm#train.cache
[22:15:46] 11252555 weights are loaded from data/train.libsvm.weight
Traceback (most recent call last):
  File "train.py", line 104, in <module>
    train(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4])
  File "train.py", line 66, in train
    model = xgb.train(params, dtrain, num_rounds, watchlist)
  File "/home/praveen/auto-test-web/auto-test-web/src/ml/venv/lib/python3.8/site-packages/xgboost/training.py", line 208, in train
    return _train_internal(params, dtrain,
  File "/home/praveen/auto-test-web/auto-test-web/src/ml/venv/lib/python3.8/site-packages/xgboost/training.py", line 75, in _train_internal
    bst.update(dtrain, i, obj)
  File "/home/praveen/auto-test-web/auto-test-web/src/ml/venv/lib/python3.8/site-packages/xgboost/core.py", line 1367, in update
    _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
  File "/home/praveen/auto-test-web/auto-test-web/src/ml/venv/lib/python3.8/site-packages/xgboost/core.py", line 190, in _check_call
    raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: [22:15:46] /workspace/src/tree/updater_gpu_hist.cu:952: Exception in gpu_hist: [22:15:46] /workspace/src/common/hist_util.cu:287: Check failed: weights.size() == page.offset.Size() - 1 (11252555 vs. 921785
```

So from the error message, it appears the weights file is recognized and weights for all 1125255 instances are loaded. However, since we are training in batches (using the external memory feature), we only load 921785 instances and the requirement that the size of the weights vector should equal that of the training dataset doesn't hold.

I have also tried specifying weights in the LibSVM input file directly - by replacing the `label` entry with `label:weight`, but I get the exact same error. 

I'm using XGBoost version 1.1.0.



    


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Training fails when using external memory version for large datasets with instance weights #5866

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Training fails when using external memory version for large datasets with instance weights #5866

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions