What will you get with this integration?¶
XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. The integration with Neptune lets you log multiple training artifacts with no further customization.
The integration is implemented as XGBoost callback and provides the following capabilities:
Log metrics (train and eval) after each boosting iteration.
Log model (Booster) to Neptune after the last boosting iteration.
Log feature importance to Neptune as an image after the last boosting iteration.
Log visualized trees to Neptune as images after the last boosting iteration.
This integration is tested with
Where to start?¶
To get started with this integration, follow the Quickstart below.
If you want to try things out and focus only on the code you can either:
Open Colab notebook with quickstart code and run it as an anonymous user “neptuner” - zero setup, it just works,
View quickstart code as a plain Python script on GitHub.
This quickstart will show you how to log XGBoost experiments to Neptune using XGBoost-Neptune integration.
Integration is implemented as XGBoost callback and made available in the
As a result you will have an experiment logged to Neptune with metrics, model, feature importances and (optionally, requires graphviz) visualized trees. Have a look at this example experiment.
Before you start¶
Python 3.x and following libraries installed:
Make sure you have created an experiment before you start XGBoost training. Use the
create_experiment() method to do this.
Here is how to use the Neptune-XGBoost integration:
import neptune ... # here you import `neptune_callback` that does the magic (the open source magic :) from neptunecontrib.monitoring.xgboost import neptune_callback ... # Use neptune callback neptune.create_experiment(name='xgb', tags=['train'], params=params) xgb.train(params, dtrain, num_round, watchlist, callbacks=[neptune_callback()]) # neptune_callback is here
These are logged for train and eval (or whatever you defined in the watchlist) after each boosting iteration.
The model (Booster) is logged to Neptune after the last boosting iteration. If you run cross-validation, you get a model for each fold.
Logged feature importance¶
This is a very useful chart, as it shows feature importance. It is logged to Neptune as an image after the last boosting iteration. If you run cross-validation, you get a feature importance chart for each fold’s model.
Logged visualized trees (requires graphviz)¶
Log first 6 trees at the end of training (tree with indices 0, 1, 2, 3, 4, 5)
xgb.train(params, dtrain, num_round, watchlist, callbacks=[neptune_callback(log_tree=[0,1,2,3,4,5])])
Selected trees are logged to Neptune as an image after the last boosting iteration. If you run cross-validation, you get a tree visualization for each fold’s model, independently.
If you are using Windows machine with Python 3.8 and
xgboost-1.2.1, you may encounter tkinter error when logging feature importance. This problem does not occur on the Windows machine with Python 3.8 and
xgboost-1.2.0. Also, it does not occur on the Windows machine with Python 3.6 or Python 3.7.