LightGBM

You can use Neptune integration with LightGBM to capture model training metadata through NeptuneCallback and also log model summary after training.

You can find detailed information on how to install and use the integration in the user guide.

NeptuneCallback

Neptune callback for logging metadata during LightGBM model training.

This callback logs parameters, evaluation results, and info about the train_set: feature names, number of data points (num_rows) and number of features (num_features). Evaluation results are logged separately for every valid_sets. For example with "metric": "logloss" and valid_names=["train","valid"], 2 logs are created: train/logloss and valid/logloss.

Callback works with lgbm.train() and lgbm.cv() functions, and with the scikit-learn API model.fit().

Parameters

run

(Run) - An existing run reference (as returned by neptune.init()).

base_namespace

(str, optional, default is None) - Namespace under which all metadata logged by the NeptuneCallback will be stored.

Examples

# Create run
import neptune.new as neptune
run = neptune.init(project="WORKSPACE/PROJECT")
# Instantiate the callback and pass it to training function
from neptune.new.integrations.lightgbm import NeptuneCallback
neptune_callback = NeptuneCallback(run=run)
gbm = lgb.train(params,
...
callbacks=[neptune_callback])

.create_booster_summary()

Create a model summary after training that can be assigned to the run namespace.

You can log the summary to the new run, or to the same run that you used for logging model training. The second option can be very useful because you have all the information in a single run.

Parameters

booster

(lightgbm.Booster or lightgbm.LGBMModel) - The trained LightGBM model.

log_importances

(bool, default is True) - Whether to log feature importance charts.

max_num_features

(int, default is 10) - Max number of top features to log on the importance charts. Works only if log_importances is set to True. If None or <1, all features will be displayed. See lightgbm.plot_importance for details.

list_trees

(list of int, default is None) - Indices of the target tree to visualize. Works only if log_trees is set to True.

log_trees_as_dataframe

(bool, default is True) - Whether to parse the model and log trees in the easy-to-read pandas DataFrame format. Works only for Booster objects. See lightgbm.Booster.trees_to_dataframe for details.

log_pickled_booster

(bool, default is True) - Whether to log the model as a pickled file.

log_trees

(bool, default is False) - Whether to log visualized trees. This requires graphviz library to work, read how to install it in the user guide.

tree_figsize

(int, default is 30) - Control size of the visualized tree image. Increase this in case you work with large trees. Works only if log_trees is set to True.

log_confusion_matrix

(bool, default is False) - Whether to log confusion matrix. If set to True, you need to pass y_true and y_pred.

y_true

(numpy.array, default is None) - True labels on the test set. Needed only if log_confusion_matrix is set to True.

y_pred

(numpy.array, default is None) - Predictions on the test set. Needed only if log_confusion_matrix is set to True.

Returns

dict with all metadata, that can be assigned to the run namespace. run["booster_summary"] = create_booster_summary(...)

Examples

# Initialize Neptune client
import neptune.new as neptune
run = neptune.init(project="common/lightgbm-integration",
api_token="ANONYMOUS")
# Train LightGBM model
gbm = lgb.train(params, ...)
# Log booster summary to Neptune
from neptune.new.integrations.lightgbm import create_booster_summary
run["lgbm_summary"] = create_booster_summary(booster=gbm)
# You can customize what you exactly want to log
run["lgbm_summary"] = create_booster_summary(
booster=gbm,
log_trees=True,
list_trees=[0, 1, 2, 3, 4],
log_confusion_matrix=True,
y_pred=y_pred,
y_true=y_test,
)
# In order to log confusion matrix predicted labels and ground trouth is required
y_pred = np.argmax(gbm.predict(X_test), axis=1)
run["lgbm_summary"] = create_booster_summary(
booster=gbm,
log_confusion_matrix=True,
y_pred=y_pred,
y_true=y_test,
)