API reference: LightGBM integration#
You can use NeptuneCallback
to capture model training metadata and log model summary after training.
NeptuneCallback
#
Neptune callback for logging metadata during LightGBM model training.
The callback logs parameters, evaluation results, and info about the train_set
:
- feature names
- number of data points (
num_rows
) - number of features (
num_features
)
Evaluation results are logged separately for every valid_sets
. For example, with "metric": "logloss"
and valid_names=["train","valid"]
, two logs are created: train/logloss
and valid/logloss
.
The callback works with the lgbm.train()
and lgbm.cv()
functions, and with the scikit-learn API model.fit()
.
Parameters
Name | Type | Default | Description |
---|---|---|---|
run |
Run or Handler , optional |
None |
Existing run reference, as returned by neptune.init_run() , or a namespace handler. |
base_namespace |
str , optional |
experiment |
Namespace under which all metadata logged by the Neptune callback will be stored. |
Example
Create a Neptune run:
Instantiate the callback and pass it to training function:
from neptune.integrations.lightgbm import NeptuneCallback
neptune_callback = NeptuneCallback(run=run)
gbm = lgb.train(params, ..., callbacks=[neptune_callback])
If Neptune can't find your project name or API token
As a best practice, you should save your Neptune API token and project name as environment variables:
Alternatively, you can pass the information when using a function that takes api_token
and project
as arguments:
run = neptune.init_run(
api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8", # (1)!
project="ml-team/classification", # (2)!
)
- In the bottom-left corner, expand the user menu and select Get my API token.
- You can copy the path from the project details ( → Details & privacy).
If you haven't registered, you can log anonymously to a public project:
Make sure not to publish sensitive data through your code!
create_booster_summary()
#
Create a model summary after training that can be assigned to the run namespace.
Tip
To have all the information in a single run, you can log the summary to the same run that you used for logging model training.
Parameters
Name | Type | Default | Description |
---|---|---|---|
booster |
lightgbm.Booster or lightgbm.LGBMModel |
- | The trained LightGBM model. |
log_importances |
bool |
True |
Whether to log feature importance charts. |
max_num_features |
int |
10 |
Max number of top features to log on the importance charts. Works when log_importances is set to True . If None or <1 , all features will be displayed.See |
list_trees |
list of int |
None |
Indices of the target tree to visualize. Works when log_trees is set to True . |
log_trees_as_dataframe |
bool |
False |
Whether to parse the model and log trees in CSV format. Works only for Booster objects. See lightgbm.Booster.trees_to_dataframe for details. |
log_pickled_booster |
bool |
True |
Whether to log the model as a pickled file. |
log_trees |
bool |
False |
Whether to log visualized trees. This requires the Graphviz library to be installed. |
tree_figsize |
int |
30 |
Controls the size of the visualized tree image. Increase this in case you work with large trees. Works when log_trees is set to True . |
log_confusion_matrix |
bool |
False |
Whether to log confusion matrix. If set to True , you need to pass y_true and y_pred . |
y_true |
numpy.array |
None |
True labels on the test set. Needed only if log_confusion_matrix is set to True . |
y_pred |
numpy.array |
None |
Predictions on the test set. Needed only if log_confusion_matrix is set to True . |
Returns
dict
with all metadata, which you can assign to the Neptune run:
Examples
Initialize a Neptune run:
-
The full project name. For example,
"ml-team/classification"
.- You can copy the name from the project details ( → Details & privacy)
- You can also find a pre-filled
project
string in Experiments → Create a new run.
Train LightGBM model and log booster summary to Neptune:
from neptune.integrations.lightgbm import create_booster_summary
gbm = lgb.train(params, ...)
run["lgbm_summary"] = create_booster_summary(booster=gbm)
You can customize what to log:
run["lgbm_summary"] = create_booster_summary(
booster=gbm,
log_trees=True,
list_trees=[0, 1, 2, 3, 4],
log_confusion_matrix=True,
y_pred=y_pred,
y_true=y_test,
)
In order to log a confusion matrix, the predicted labels and ground truth are required:
y_pred = np.argmax(gbm.predict(X_test), axis=1)
run["lgbm_summary"] = create_booster_summary(
booster=gbm,
log_confusion_matrix=True,
y_pred=y_pred,
y_true=y_test,
)
See also
neptune-lightgbm repo on GitHub