-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Closed
Description
Hi Team,
While running Machine Learning training models using xgboost in cluster and sparkContaxt is always getting shutdown after encountering any training task failure exception. So every time we need to restart the cluster to bring it back to normal state.
After looking for the root cause We found the code which causing the sparkcontext to close. I am not sure why sparkContext has to shutdown for any task failure, This is causing the other training models job to end which is not required.
above code is rolled out in 0.82 and 0.9 versions, Is it possible to fix it or any reason for this change in the new versions.
Metadata
Metadata
Assignees
Labels
No labels