Neptune-XGBoost Integration

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. The integration with Neptune lets you log multiple training artifacts with no further customization.

XGBoost overview

The integration is implemented as an XGBoost callback and provides the following capabilities:

Tip

Try the integration right away with this Google Colab.

Requirements

This integration makes use of the XGBoost library and is part of neptune-contrib.

Make sure you have all dependencies installed. You can use the bash command below:

pip install 'neptune-contrib[monitoring]>=0.18.4'

Basic example

Make sure you have created an experiment before you start XGBoost training. Use the create_experiment() method to do this.

Here is how to use the Neptune-XGBoost integration:

import neptune
...
# here you import `neptune_callback` that does the magic (the open source magic :)
from neptunecontrib.monitoring.xgboost import neptune_callback

...

# Use neptune callback
neptune.create_experiment(name='xgb', tags=['train'], params=params)
xgb.train(params, dtrain, num_round, watchlist,
          callbacks=[neptune_callback()])  # neptune_callback is here

Example results

Logged metrics

These are logged for train and eval (or whatever you defined in the watchlist) after each boosting iteration.

XGBoost overview

Logged model

The model (Booster) is logged to Neptune after the last boosting iteration. If you run cross-validation, you get a model for each fold.

XGBoost overview

Logged feature importance

This is a very useful chart, as it shows feature importance. It is logged to Neptune as an image after the last boosting iteration. If you run cross-validation, you get a feature importance chart for each fold’s model.

XGBoost overview

Logged visualized trees

Selected trees are logged to Neptune as an image after the last boosting iteration. If you run cross-validation, you get a tree visualization for each fold’s model, independently.

XGBoost overview

Resources

Notebooks with examples

Full script

Example results

import neptune
import pandas as pd
import xgboost as xgb
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

# here you import `neptune_callback` that does the magic (the open source magic :)
from neptunecontrib.monitoring.xgboost import neptune_callback

# Set project
# For this demonstration, I use public user: neptuner, who has 'ANONYMOUS' token .
# Thanks to this you can run this code as is and see results in Neptune :)
neptune.init('shared/XGBoost-integration',
             api_token='ANONYMOUS')

# Data
boston = load_boston()
data = pd.DataFrame(boston.data)
data.columns = boston.feature_names
data['PRICE'] = boston.target
X, y = data.iloc[:,:-1], data.iloc[:,-1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=102030)

dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

# Params
params = {'max_depth': 5,
          'eta': 0.5,
          'gamma': 0.1,
          'silent': 1,
          'subsample': 1,
          'lambda': 1,
          'alpha': 0.35,
          'objective': 'reg:linear',
          'eval_metric': ['mae', 'rmse']}
watchlist = [(dtest, 'eval'), (dtrain, 'train')]
num_round = 20

# Train model
neptune.create_experiment(name='xgb', tags=['train'], params=params)
xgb.train(params, dtrain, num_round, watchlist,
          callbacks=[neptune_callback(log_tree=[0,1,2])])