API reference: XGBoost integration#
You can use the Neptune integration with XGBoost to capture model training metadata with NeptuneCallback
.
NeptuneCallback
#
Neptune callback for logging metadata during XGBoost model training.
Prerequisites
This callback requires xgboost>=1.3.0
.
The callback logs the following:
- Metrics
- All parameters
- Learning rate
- The pickled model
- Visualizations (feature importances and trees)
- If early stopping is activated,
best_score
andbest_iteration
are also logged.
The callback works with the xgboost.train()
and xgboost.cv()
functions, and with model.fit()
from the scikit-learn API.
Metrics are logged for every dataset in the evals
list and for every metric specified.
Example: With evals = [(dtrain, "train"), (dval, "valid")]
and "eval_metric": ["mae", "rmse"]
, four metrics are created:
"train/mae"
"train/rmse"
"valid/mae"
"valid/rmse"
Parameters
Name | Type | Default | Description |
---|---|---|---|
run |
Run or Handler |
- | An existing run reference, as returned by neptune.init_run() , or a namespace handler. |
base_namespace |
str , optional |
"training" |
Namespace under which all metadata logged by the Neptune callback will be stored. |
log_model |
bool |
True |
Whether to log the model as a pickled file at the end of training. |
log_importance |
bool |
True |
Whether to log feature importance charts at the end of training. |
max_num_features |
int |
10 |
Max number of top features to log on the importance charts. Works when log_importances is set to True . If None or <1 , all features will be displayed.For details, see |
log_tree |
list of int |
None |
Indexes of target trees to log as charts. Requires the Graphviz library to be installed. For details, see |
tree_figsize |
int |
30 |
Controls the size of the visualized tree image. Increase this in case you work with large trees. Works when log_trees is not None . |
Examples
Create a Neptune run:
If Neptune can't find your project name or API token
As a best practice, you should save your Neptune API token and project name as environment variables:
Alternatively, you can pass the information when using a function that takes api_token
and project
as arguments:
run = neptune.init_run(
api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8", # (1)!
project="ml-team/classification", # (2)!
)
- In the bottom-left corner, expand the user menu and select Get my API token.
- You can copy the path from the project details ( → Details & privacy).
If you haven't registered, you can log anonymously to a public project:
Make sure not to publish sensitive data through your code!
Create a Neptune callback and pass it to xgb.train()
:
from neptune.integrations.xgboost import NeptuneCallback
neptune_callback = NeptuneCallback(run=run)
xgb.train( ..., callbacks=[neptune_callback])
When creating the callback, you can specify what you want to log and where:
neptune_callback = NeptuneCallback(
run=run,
base_namespace="experiment",
log_model=False,
log_tree=[0, 1, 2, 3],
)
See also
neptune-xgboost repo on GitHub