Skip to content

API reference: LightGBM integration#

You can use NeptuneCallback to capture model training metadata and log model summary after training.


NeptuneCallback#

Neptune callback for logging metadata during LightGBM model training.

The callback logs parameters, evaluation results, and info about the train_set:

  • feature names
  • number of data points (num_rows)
  • number of features (num_features)

Evaluation results are logged separately for every valid_sets. For example, with "metric": "logloss" and valid_names=["train","valid"], two logs are created: train/logloss and valid/logloss.

The callback works with the lgbm.train() and lgbm.cv() functions, and with the scikit-learn API model.fit().

Parameters

Name       Type Default     Description
run Run or Handler, optional None Existing run reference, as returned by neptune.init_run(), or a namespace handler.
base_namespace str, optional experiment Namespace under which all metadata logged by the Neptune callback will be stored.

Example

Create a Neptune run:

import neptune

run = neptune.init_run()

Instantiate the callback and pass it to training function:

from neptune.integrations.lightgbm import NeptuneCallback

neptune_callback = NeptuneCallback(run=run)
gbm = lgb.train(params, ..., callbacks=[neptune_callback])
If Neptune can't find your project name or API token

As a best practice, you should save your Neptune API token and project name as environment variables:

export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8"
export NEPTUNE_PROJECT="ml-team/classification"

Alternatively, you can pass the information when using a function that takes api_token and project as arguments:

run = neptune.init_run( # (1)!
    api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8",  # your token here
    project="ml-team/classification",  # your full project name here
)
  1. Also works for init_model(), init_model_version(), init_project(), and integrations that create Neptune runs underneath the hood, such as NeptuneLogger or NeptuneCallback.

  2. API token: In the bottom-left corner, expand the user menu and select Get my API token.

  3. Project name: You can copy the path from the project details ( Edit project details).

If you haven't registered, you can log anonymously to a public project:

api_token=neptune.ANONYMOUS_API_TOKEN
project="common/quickstarts"

Make sure not to publish sensitive data through your code!


create_booster_summary()#

Create a model summary after training that can be assigned to the run namespace.

Tip

To have all the information in a single run, you can log the summary to the same run that you used for logging model training.

Parameters

Name          Type Default Description
booster lightgbm.Booster or lightgbm.LGBMModel - The trained LightGBM model.
log_importances bool True Whether to log feature importance charts.
max_num_features int 10 Max number of top features to log on the importance charts. Works when log_importances is set to True. If None or <1, all features will be displayed.

See lightgbm.plot_importance for details.

list_trees list of int None Indices of the target tree to visualize. Works when log_trees is set to True.
log_trees_as_dataframe bool False Whether to parse the model and log trees in CSV format. Works only for Booster objects. See lightgbm.Booster.trees_to_dataframe for details.
log_pickled_booster bool True Whether to log the model as a pickled file.
log_trees bool False Whether to log visualized trees. This requires the Graphviz library to be installed.
tree_figsize int 30 Controls the size of the visualized tree image. Increase this in case you work with large trees. Works when log_trees is set to True.
log_confusion_matrix bool False Whether to log confusion matrix. If set to True, you need to pass y_true and y_pred.
y_true numpy.array None True labels on the test set. Needed only if log_confusion_matrix is set to True.
y_pred numpy.array None Predictions on the test set. Needed only if log_confusion_matrix is set to True.

Returns

dict with all metadata, which you can assign to the Neptune run:

run["booster_summary"] = create_booster_summary(...)

Examples

Initialize a Neptune run:

import neptune

run = neptune.init_run(project="workspace-name/project-name") # (1)!
  1. The full project name. For example, "ml-team/classification".

    To find the required string in the Neptune app, click How to create a new run. You can copy the project argument from the modal that opens.

Train LightGBM model and log booster summary to Neptune:

from neptune.integrations.lightgbm import create_booster_summary

gbm = lgb.train(params, ...)
run["lgbm_summary"] = create_booster_summary(booster=gbm)

You can customize what to log:

run["lgbm_summary"] = create_booster_summary(
    booster=gbm,
    log_trees=True,
    list_trees=[0, 1, 2, 3, 4],
    log_confusion_matrix=True,
    y_pred=y_pred,
    y_true=y_test,
)

In order to log a confusion matrix, the predicted labels and ground truth are required:

y_pred = np.argmax(gbm.predict(X_test), axis=1)
run["lgbm_summary"] = create_booster_summary(
    booster=gbm,
    log_confusion_matrix=True,
    y_pred=y_pred,
    y_true=y_test,
)

See also

neptune-lightgbm repo on GitHub