Neptune-XGBoost Integration

What will you get with this integration?

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. The integration with Neptune lets you log multiple training artifacts with no further customization.

XGBoost overview

The integration is implemented as XGBoost callback and provides the following capabilities:

Note

This integration is tested with xgboost==1.2.0, and neptune-client==0.4.124.

Where to start?

To get started with this integration, follow the Quickstart below.

If you want to try things out and focus only on the code you can either:

  1. Open Colab notebook with quickstart code and run it as an anonymous user “neptuner” - zero setup, it just works,

  2. View quickstart code as a plain Python script on GitHub.

Quickstart

This quickstart will show you how to log XGBoost experiments to Neptune using XGBoost-Neptune integration. Integration is implemented as XGBoost callback and made available in the neptune-contrib library.

As a result you will have an experiment logged to Neptune with metrics, model, feature importances and (optionally, requires graphviz) visualized trees. Have a look at this example experiment.

Before you start

You have Python 3.x and following libraries installed:

Example

Make sure you have created an experiment before you start XGBoost training. Use the create_experiment() method to do this.

Here is how to use the Neptune-XGBoost integration:

import neptune
...
# here you import `neptune_callback` that does the magic (the open source magic :)
from neptunecontrib.monitoring.xgboost import neptune_callback

...

# Use neptune callback
neptune.create_experiment(name='xgb', tags=['train'], params=params)
xgb.train(params, dtrain, num_round, watchlist,
          callbacks=[neptune_callback()])  # neptune_callback is here

Logged metrics

These are logged for train and eval (or whatever you defined in the watchlist) after each boosting iteration.

XGBoost overview

Logged model

The model (Booster) is logged to Neptune after the last boosting iteration. If you run cross-validation, you get a model for each fold.

XGBoost overview

Logged feature importance

This is a very useful chart, as it shows feature importance. It is logged to Neptune as an image after the last boosting iteration. If you run cross-validation, you get a feature importance chart for each fold’s model.

XGBoost overview

Logged visualized trees (requires graphviz)

Note

You need to install graphviz and graphviz Python interface for log_tree feature to work. Check Graphviz and Graphviz Python interface for installation info.

Log first 6 trees at the end of training (tree with indices 0, 1, 2, 3, 4, 5)

xgb.train(params, dtrain, num_round, watchlist,
          callbacks=[neptune_callback(log_tree=[0,1,2,3,4,5])])

Selected trees are logged to Neptune as an image after the last boosting iteration. If you run cross-validation, you get a tree visualization for each fold’s model, independently.

XGBoost overview

Explore Results

You just learned how to start logging XGBoost experiments to Neptune. Check this experiment or view quickstart code as a plain Python script on GitHub.

XGBoost overview

Common problems

If you are using Windows machine with Python 3.8 and xgboost-1.2.1, you may encounter tkinter error when logging feature importance. This problem does not occur on the Windows machine with Python 3.8 and xgboost-1.2.0. Also, it does not occur on the Windows machine with Python 3.6 or Python 3.7.

How to ask for help?

Please visit the Getting help page. Everything regarding support is there.