-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Description
Merry Christmas, all.
I am suffering from a different behaviour between an old version and
the newest version of XGBoost. Here are some information.
- 1.3.1 ( pip install xgboost ) the newest one
- 1.1.0 ( pip install -Iv xgboost==1.1.0 ) the old one that I used
Ubuntu 18.04.4 LTS
CUDA 11.1.74
QUADRO RTX 8000 48GB
Data file (7.49 GB) can be downloaded from (mega cloud)
https://mega.nz/file/irQXCSbA#rfl53ewh88k_pBsVLw_fA5hC6mz91Vi9yRUrMqEMYno
or it also can be found at (google drive)
https://drive.google.com/file/d/14RCv2COi6rw7JWDkYp9825z7zSHY1ogQ/view?usp=sharing
The issue with the old version was not being able to dump the trained model
while the training itself progresses very nicely.
( https://discuss.xgboost.ai/t/serialisation-of-large-models/1897/2 )
As I am aware that there have been a lot of effort to improve the dump process,
I had to try the newest version. But using the newst version, the issue is that
I cannot see the same training behaviour as the old one. Simply put,
it seems it takes a way longer time in order to see the rounds proceed.
I only terminated with a keyboard interruption.
For 30 minutes, there is no one-single progress of a round using v1.3.1.
Using v1.1.0 and waiting for 30 minutes, the prints at terminal read
[0] convergence-merror:0.10068
[1] convergence-merror:0.10052
[2] convergence-merror:0.09998
[3] convergence-merror:0.09930
[4] convergence-merror:0.09852
[5] convergence-merror:0.09773
[6] convergence-merror:0.09690
[7] convergence-merror:0.09608
[8] convergence-merror:0.09524
[9] convergence-merror:0.09439
[10] convergence-merror:0.09355
[11] convergence-merror:0.09268
[12] convergence-merror:0.09182
[13] convergence-merror:0.09097
[14] convergence-merror:0.09013
[15] convergence-merror:0.08930
[16] convergence-merror:0.08855
[17] convergence-merror:0.08781
[18] convergence-merror:0.08708
[19] convergence-merror:0.08636
[20] convergence-merror:0.08557
[21] convergence-merror:0.08477
Note that reinstalling the old version after removing the newest recovers
the previous good training procedure.
Since which code I use matters, I also put the code at the end of this.
You can give an input argument to tell how many rounds it trains when running the script.
But the path for the data that I mentioned above is hard-coded with its path.
I wish I could hear some advice.
Besides, I also want to study why this is happening from my side.
Could anyone let me know how to download the "full code" of v1.1.0
which I could install with ( pip install -Iv xgboost==1.1.0 )?
I want to study the routines and learn by myself about the implementation.
Since I am testing in Ubuntu, I am supposed to study and build the full code of
the old version in Ubuntu. I am not very good at Github and I do not know how to
download the full code to build of an old release.
Thanks a lot in advance.
Comment: I copied the code from Sublime Text.
Some spaces look irregular here, though.
They would look better in a text editor hopefully.
#-----------------------------------------------------------------------------------------
# modules
#-----------------------------------------------------------------------------------------
from sklearn.metrics import accuracy_score # convenient accuracy measure
import xgboost as xgb # XGBoost software package
import h5py # deal with .mat data files
import sys # deal with command arguments
import gc; gc.enable(); # garbage memory collection
#-------------------------------------------------------------------------------
# functions
#-------------------------------------------------------------------------------
def serialise_model(__pne__, __model__):
__model__.save_model(__pne__); gc.collect()
def deserialise_model(__pne__):
__model__ = xgb.Booster(); xgb.Booster.load_model(__model__, __pne__)
return __model__
def load_mat(__pne__): # load .mat data and return xgboost.DMatrix
print('load_mat(%s): Commence.' % __pne__, flush = 1)
Mat = h5py.File(__pne__)
X__ = (Mat.get('X')[()]).transpose();
y__ = (Mat.get('y')[()]).transpose().flatten(); del Mat;
Xy_ = xgb.DMatrix(data = X__, label = y__)
__shape__ = X__.shape
print('load_mat(): DMatrix ready. X__.shape = ', end = '', flush = 1)
print(__shape__, flush = 1)
del X__; gc.collect();
return Xy_, y__, __shape__;
def set_model_params(__max_depth__, __num_class__):
xgb_params = {
'process_type': 'default', # \in {default, update} for the continuation of a tree.
'tree_method': 'gpu_hist',
'booster': 'gbtree',
'grow_policy': 'lossguide', # lossguide: split at nodes with highest loss change.
'num_parallel_tree': 1, # number of trees in boosted random forest.
'min_split_loss': 0, # 0: split whenver it improves.
'learning_rate': 1.0, # 1: no decay. weight on previous trees.
'max_depth': __max_depth__, # \in [0, Inf], 0 accepted if (hist | lossguided). cf. memory (!)
'max_leaves': 0,
'reg_lambda': 0.0, # L2-regularisation
'reg_alpha': 0.0, # L1-regularisation
'num_class': __num_class__,
'objective': 'multi:softmax',
'eval_metric': 'merror',
'predictor': 'gpu_predictor',
'verbosity': 1,
'validate_parameters': 0,
'single_precision_histogram': 0, # gpu_hist can fail with single precision
'deterministic_histogram': 0 # default 1: rounds losing accuracy
}
return xgb_params
#-----------------------------------------------------------------------------------------
# data
#-----------------------------------------------------------------------------------------
data_pne = '/workspace/temp/data.mat'
my_model_path = '/workspace/temp/'
num_classes = 3
my_max_depth = 21 # 21
print('from: %s' % (data_pne))
print('to: %s' % (my_model_path))
#-----------------------------------------------------------------------------------------
# user parameters
#-----------------------------------------------------------------------------------------
if len(sys.argv) != 2:
print('Give number of rounds');
exit()
dump_model = 1
my_print_freq = 1
xgb_params = set_model_params(my_max_depth, num_classes)
#-----------------------------------------------------------------------------------------
# ----------------------------------------------------- using xgboost-core
#-----------------------------------------------------------------------------------------
Xy, y, shape = load_mat(data_pne)
progress = dict()
my_num_rounds = int(sys.argv[1])
my_model_pne = '%s/%03d.json' % (my_model_path, my_num_rounds)
try:
M0 = deserialise_model(my_model_pne)
print('Existing model detected.\n')
except:
M0 = None; print('Model will be newly trained.\n')
M0 = xgb.train(xgb_params, Xy, my_num_rounds, [(Xy, 'convergence')],
evals_result = progress,
verbose_eval = my_print_freq,
xgb_model = M0)
## report result
y0 = np.ubyte(M0.predict(Xy))
ACC = accuracy_score(y0, np.ubyte(y))
print('\nAccuracy = %6.2f%%' % (ACC * 100.0) )
## secure memory as much as possible
del Xy; del y; gc.collect()
## saving the trained model
if dump_model == 1:
print('Dumping model M0: ', end = '', flush = 1);
try:
serialise_model(my_model_pne, M0)
except:
print('serialisation failure.')