Skip to content

API reference: LightGBM integration#

You can use NeptuneCallback to capture model training metadata and log model summary after training.


For an in-depth tutorial, see IntegrationsLightGBM integration guide


Neptune callback for logging metadata during LightGBM model training.

The callback logs parameters, evaluation results, and info about the train_set:

  • feature names
  • number of data points (num_rows)
  • number of features (num_features)

Evaluation results are logged separately for every valid_sets. For example, with "metric": "logloss" and valid_names=["train","valid"], two logs are created: train/logloss and valid/logloss.

The callback works with the lgbm.train() and functions, and with the scikit-learn API


Name       Type Default     Description
run Run, optional None Existing run reference, as returned by neptune.init_run().
base_namespace str, optional experiment Namespace under which all metadata logged by the Neptune callback will be stored.


# Create Neptune run
import as neptune
run = neptune.init_run()

# Instantiate the callback and pass it to training function
from import NeptuneCallback

neptune_callback = NeptuneCallback(run=run)
gbm = lgb.train(
    params, ..., callbacks=[neptune_callback]


Create a model summary after training that can be assigned to the run namespace.


To have all the information in a single run, you can log the summary to the same run that you used for logging model training.


Name         Type Default Description
booster lightgbm.Booster or lightgbm.LGBMModel - The trained LightGBM model.
log_importances bool True Whether to log feature importance charts.
max_num_features int 10 Max number of top features to log on the importance charts. Works when log_importances is set to True. If None or <1, all features will be displayed.

See lightgbm.plot_importance for details.

list_trees list of int None Indices of the target tree to visualize. Works when log_trees is set to True.
log_trees_as_dataframe bool True Whether to parse the model and log trees in pandas DataFrame format. Works only for Booster objects. See lightgbm.Booster.trees_to_dataframe for details.
log_pickled_booster bool True Whether to log the model as a pickled file.
log_trees bool False Whether to log visualized trees. This requires the Graphviz library to be installed.
tree_figsize int 30 Controls the size of the visualized tree image. Increase this in case you work with large trees. Works when log_trees is set to True.
log_confusion_matrix bool False Whether to log confusion matrix. If set to True, you need to pass y_true and y_pred.
y_true numpy.array None True labels on the test set. Needed only if log_confusion_matrix is set to True.
y_pred numpy.array None Predictions on the test set. Needed only if log_confusion_matrix is set to True.


dict with all metadata, which can be assigned to the run namespace: run["booster_summary"] = create_booster_summary(...)


# Initialize a Neptune run
import as neptune
run = neptune.init_run(
    project="common/lightgbm-integration",  # (1)

# Train LightGBM model
gbm = lgb.train(params, ...)

# Log booster summary to Neptune
from import create_booster_summary
run["lgbm_summary"] = create_booster_summary(booster=gbm)

# You can customize what to log
run["lgbm_summary"] = create_booster_summary(
    list_trees=[0, 1, 2, 3, 4],

# In order to log a confusion matrix,
# the predicted labels and ground trouth are required
y_pred = np.argmax(gbm.predict(X_test), axis=1)
run["lgbm_summary"] = create_booster_summary(
  1. The full project name. For example, "ml-team/classification". To copy it, navigate to the project settingsProperties.
Back to top