Skip to content

API reference: XGBoost integration#

You can use the Neptune integration with XGBoost to capture model training metadata with NeptuneCallback.


NeptuneCallback#

Neptune callback for logging metadata during XGBoost model training.

Prerequisites

This callback requires xgboost>=1.3.0.

The callback logs the following:

  • Metrics
  • All parameters
  • Learning rate
  • The pickled model
  • Visualizations (feature importances and trees)
  • If early stopping is activated, best_score and best_iteration are also logged.

The callback works with the xgboost.train() and xgboost.cv() functions, and with model.fit() from the scikit-learn API.

Metrics are logged for every dataset in the evals list and for every metric specified.

Example: With evals = [(dtrain, "train"), (dval, "valid")] and "eval_metric": ["mae", "rmse"], four metrics are created:

  1. "train/mae"
  2. "train/rmse"
  3. "valid/mae"
  4. "valid/rmse"

Parameters

Name       Type Default     Description
run Run or Handler - An existing run reference, as returned by neptune.init_run(), or a namespace handler.
base_namespace str, optional "training" Namespace under which all metadata logged by the Neptune callback will be stored.
log_model bool True Whether to log the model as a pickled file at the end of training.
log_importance bool True Whether to log feature importance charts at the end of training.
max_num_features int 10 Max number of top features to log on the importance charts. Works when log_importances is set to True. If None or <1, all features will be displayed.

For details, see xgboost.plot_importance() .

log_tree list of int None Indexes of target trees to log as charts. Requires the Graphviz library to be installed.

For details, see xgboost.to_graphviz() .

tree_figsize int 30 Controls the size of the visualized tree image. Increase this in case you work with large trees. Works when log_trees is not None.

Examples

Create a Neptune run:

import neptune

run = neptune.init_run()
If Neptune can't find your project name or API token

As a best practice, you should save your Neptune API token and project name as environment variables:

export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8"
export NEPTUNE_PROJECT="ml-team/classification"

Alternatively, you can pass the information when using a function that takes api_token and project as arguments:

run = neptune.init_run( # (1)!
    api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8",  # your token here
    project="ml-team/classification",  # your full project name here
)
  1. Also works for init_model(), init_model_version(), init_project(), and integrations that create Neptune runs underneath the hood, such as NeptuneLogger or NeptuneCallback.

  2. API token: In the bottom-left corner, expand the user menu and select Get my API token.

  3. Project name: You can copy the path from the project details ( Edit project details).

If you haven't registered, you can log anonymously to a public project:

api_token=neptune.ANONYMOUS_API_TOKEN
project="common/quickstarts"

Make sure not to publish sensitive data through your code!

Create a Neptune callback and pass it to xgb.train():

from neptune.integrations.xgboost import NeptuneCallback

neptune_callback = NeptuneCallback(run=run)

xgb.train( ..., callbacks=[neptune_callback])

When creating the callback, you can specify what you want to log and where:

neptune_callback = NeptuneCallback(
    run=run,
    base_namespace="experiment",
    log_model=False,
    log_tree=[0, 1, 2, 3],
)

See also

neptune-xgboost repo on GitHub