Skip to content

Why are the results different between custom cross-entropy loss function and official cross-entropy loss function like objective:binary-logistic? #5621

@GuoYL36

Description

@GuoYL36

Hi,I want to use custom objective function,and I use the cross-entropy loss function according to https://github.com/dmlc/xgboost/blob/master/demo/guide-python/custom_objective.py. Then,I use the official cross-entropy loss function, like objective:binary-logistic for comparsion. And I find the results are differents, Here is the test example:

# python3.6+sklearn0.22
#===========datasets.make_hastie_10_2===================================================================
from sklearn import datasets
def loglikehood(labels, preds):

    #labels = train_data.get_label()
    preds = 1.0 / (1.0 + np.exp(-preds))
    grad = (preds - labels)
    hess = preds * (1. - preds)
    return grad, hess

X, y = datasets.make_hastie_10_2(n_samples=12000, random_state=1)
X = X.astype(np.float32)

# map labels from {-1, 1} to {0, 1}
labels, y = np.unique(y, return_inverse=True)

X_train, X_valid = X[:2000], X[2000:]
y_train, y_valid = y[:2000], y[2000:]
tree_nums = 100
xgb_params = {'learning_rate': 0.01, "n_estimators": tree_nums,
              "max_depth": 5,'min_child_weight': 1, 'seed': 10,
              'subsample': 0.9, 'colsample_bytree': 0.9, 'gamma': 0, 'reg_alpha': 0, 'reg_lambda': 1,
              "tree_method": "gpu_hist", "n_jobs": 8}

xgb_clf0 = xgb.XGBClassifier(**xgb_params, objective="binary:logistic")

xgb_clf1 = xgb.XGBClassifier(**xgb_params, objective=loglikehood)

xgb_clf0.fit(X_train, y_train) 
xgb_clf1.fit(X_train, y_train)

def getAuc(clf):

    y_pred_prob = clf.predict_proba(X_valid)
    y_pred = y_pred_prob.argmax(1)

    auc_score = roc_auc_score(y_valid, y_pred_prob[:,1])
    
    return auc_score
auc_scores_0 = getAuc(xgb_clf0)

auc_scores_1 = getAuc(xgb_clf1)
print("binary:logistic => auc score: ",auc_scores_0)   # 0.89597577539
print("loglikelood => auc score: ",auc_scores_1)  # 0.834210247555
#===============================================================================    

When I rewrite the function as

def loglikelood(labels, preds):

    #labels = train_data.get_label()
    #preds = 1.0 / (1.0 + np.exp(-preds))
    grad = (preds - labels)
    hess = preds * (1. - preds)
    return grad, hess

I find the results is same, so I want to know whether or not the example in https://github.com/dmlc/xgboost/blob/master/demo/guide-python/custom_objective.py is right..

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions