Working with LightGBM#
LightGBM is a gradient-boosting framework that uses tree-based learning algorithms.
With the Neptune–LightGBM integration, the following metadata is logged automatically:
- Training and validation metrics
- Parameters
- Feature names,
num_features
, andnum_rows
for the train set - Hardware consumption metrics
- stdout and stderr streams
- Training code and Git commit information
You can also log the trained LightGBM booster summary, which can contain:
- The pickled model
- The feature importance chart (gain and split)
- Visualized trees
- Trees saved as DataFrames
- Confusion matrix (for classification problems)
See example in Neptune  Code examples 
Before you start#
- Set up Neptune. Instructions:
Installing the Neptune–LightGBM integration#
On the command line or in a terminal app, such as Command Prompt, enter the following:
If you want to log visualized trees after training (recommended), additionally install Graphviz:
Note
The above installation is only for the pure Python interface to the Graphviz software. You need to install Graphviz separately.
For installation help, see the Graphviz documentation .
Quickstart#
Tip
This section is for LightGBM users who are familiar with Neptune and LightGBM callbacks.
The integration has two core components:
NeptuneCallback
for logging metadata during training, such as parameters and metrics.create_booster_summary()
for logging additional metadata after training, such as visualizations and the pickled model.
from neptune.integrations.lightgbm import (
NeptuneCallback, create_booster_summary
)
# Create run
run = neptune.init_run() # (1)!
# Create Neptune callback
neptune_callback = NeptuneCallback(run=run)
# Prepare data, params, etc.
...
# Pass the callback to the train function and train the model
gbm = lgb.train(params, lgb_train, callbacks=[neptune_callback])
# Compute test predictions
y_pred = ...
# Log summary metadata under the "lgbm_summary" namespace
run["lgbm_summary"] = create_booster_summary(
booster=gbm,
log_trees=True,
list_trees=[0, 1, 2, 3, 4],
log_confusion_matrix=True,
y_pred=y_pred,
y_true=y_test,
)
# When done logging, stop the run
run.stop()
- If you haven't set up your credentials, you can log anonymously:
neptune.init_run(api_token=neptune.ANONYMOUS_API_TOKEN, project="common/lightgbm-integration")
import lightgbm as lgb
import neptune
import numpy as np
from neptune.integrations.lightgbm import (
NeptuneCallback, create_booster_summary
)
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
# Create run
run = neptune.init_run(
project="common/lightgbm-integration",
api_token=neptune.ANONYMOUS_API_TOKEN,
name="train-cls",
tags=["lgbm-integration", "train", "cls"],
)
# Create Neptune callback
neptune_callback = NeptuneCallback(run=run)
# Prepare data
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=123,
)
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
# Define parameters
params = {
"boosting_type": "gbdt",
"objective": "multiclass",
"num_class": 10,
"metric": ["multi_logloss", "multi_error"],
"num_leaves": 21,
"learning_rate": 0.05,
"feature_fraction": 0.9,
"bagging_fraction": 0.8,
"bagging_freq": 5,
"max_depth": 12,
}
# Train the model
gbm = lgb.train(
params,
lgb_train,
num_boost_round=200,
valid_sets=[lgb_train, lgb_eval],
valid_names=["training", "validation"],
callbacks=[neptune_callback],
)
y_pred = np.argmax(gbm.predict(X_test), axis=1)
# Log summary metadata to the same run under the "lgbm_summary" namespace
run["lgbm_summary"] = create_booster_summary(
booster=gbm,
log_trees=True,
list_trees=[0, 1, 2, 3, 4],
log_confusion_matrix=True,
y_pred=y_pred,
y_true=y_test,
)
# When done logging, stop the run
run.stop()
LightGBM logging example#
This example walks you through logging metadata as you train your model with LightGBM.
You can log metadata during training with NeptuneCallback
, and after training with the create_booster_summary()
function.
Logging metadata during training#
-
Start a run:
- If you haven't set up your credentials, you can log anonymously:
neptune.init_run(api_token=neptune.ANONYMOUS_API_TOKEN, project="common/lightgbm-integration")
- If you haven't set up your credentials, you can log anonymously:
-
Initialize the Neptune callback:
-
Pass the callback to the
train()
function and train the model: -
Run your script as you normally would.
To open the run, click the Neptune link that appears in the console output.
Example link: https://app.neptune.ai/common/lightgbm-integration/e/LGBM-85
Stop the run when done
Once you are done logging, you should stop the Neptune run. You need to do this manually when logging from a Jupyter notebook or other interactive environment:
If you're running a script, the connection is stopped automatically when the script finishes executing. In notebooks, however, the connection to Neptune is not stopped when the cell has finished executing, but rather when the entire notebook stops.
import lightgbm as lgb
import neptune
from neptune.integrations.lightgbm import NeptuneCallback
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
# Create Neptune run
run = neptune.init_run(
api_token=neptune.ANONYMOUS_API_TOKEN, # (1)!
project="common/lightgbm-integration", # (2)!
name="train-cls", # optional
tags=["lgbm-integration", "train", "cls"], # optional
)
# Create Neptune callback
neptune_callback = NeptuneCallback(run=run)
# Prepare data
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=123
)
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
# Define parameters
params = {
"boosting_type": "gbdt",
"objective": "multiclass",
"num_class": 10,
"metric": ["multi_logloss", "multi_error"],
"num_leaves": 21,
"learning_rate": 0.05,
"feature_fraction": 0.9,
"bagging_fraction": 0.8,
"bagging_freq": 5,
"max_depth": 12,
}
# Train the model
gbm = lgb.train(
params,
lgb_train,
num_boost_round=200,
valid_sets=[lgb_train, lgb_eval],
valid_names=["training", "validation"],
callbacks=[neptune_callback],
)
run.stop()
- The
api_token
argument is included to enable anonymous logging. Once you register, you should leave the token out of your script and instead save it as an environment variable. - Projects in the
common
workspace are public and can be used for testing. To log to your own workspace, pass the full name of your Neptune project:workspace-name/project-name
. For example,"ml-team/classification"
. To copy it, navigate to the project settings → Properties.
Exploring results in Neptune#
In the run view, you can see the logged metadata organized into folder-like namespaces.
Name | Description |
---|---|
feature_names |
Names of features in the train set. |
monitoring |
Hardware monitoring charts, stdout, and stderr. |
params |
LightGBM model parameters. |
source_code |
Python sources associated with this run. |
sys |
Basic run metadata, like creation time, tags, description, and owner. |
train_set |
num_features and num_rows in the train set. |
training |
Training metrics. |
validation |
Validation metrics. |
Logging booster summary after training#
To log additional metadata that describes the trained model, you can use the create_booster_summary()
function.
To have all the data in the same place, you can use the Neptune callback and create the booster summary in the same script. This way, you'll log all metadata to the same run in Neptune.
Related
You can also resume logging to a previously created run, by passing the ID of the run to the initialization function: neptune.init_run(with_id="CLS-13")
.
To learn more, see Resuming a run or other object.
In the snippet below, we train the model and log summary information after training:
from neptune.integrations.lightgbm import create_booster_summary
# Create new run
run = neptune.init_run()
# Prepare data and parameters
...
# Train the model
gbm = lgb.train(params, lgb_train)
# Compute test predictions
y_pred = ...
# Log summary metadata under the "lgbm_summary" namespace
run["lgbm_summary"] = create_booster_summary(
booster=gbm,
log_trees=True,
list_trees=[0, 1, 2, 3, 4],
log_confusion_matrix=True,
y_pred=y_pred,
y_true=y_test,
)
run.stop()
import lightgbm as lgb
import neptune
import numpy as np
from neptune.integrations.lightgbm import (
NeptuneCallback, create_booster_summary)
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
# Create run
run = neptune.init_run(
project="common/lightgbm-integration",
api_token=neptune.ANONYMOUS_API_TOKEN,
name="train-cls",
tags=["lgbm-integration", "train", "cls"],
)
# Create neptune callback
neptune_callback = NeptuneCallback(run=run)
# Prepare data
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=123,
)
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
# Define parameters
params = {
"boosting_type": "gbdt",
"objective": "multiclass",
"num_class": 10,
"metric": ["multi_logloss", "multi_error"],
"num_leaves": 21,
"learning_rate": 0.05,
"feature_fraction": 0.9,
"bagging_fraction": 0.8,
"bagging_freq": 5,
"max_depth": 12,
}
# Train the model
gbm = lgb.train(
params,
lgb_train,
num_boost_round=200,
valid_sets=[lgb_train, lgb_eval],
valid_names=["training", "validation"],
callbacks=[neptune_callback],
)
y_pred = np.argmax(gbm.predict(X_test), axis=1)
# Log summary metadata to the same run under the "lgbm_summary" namespace
run["lgbm_summary"] = create_booster_summary(
booster=gbm,
log_trees=True,
list_trees=[0, 1, 2, 3, 4],
log_confusion_matrix=True,
y_pred=y_pred,
y_true=y_test,
)
run.stop()
The create_booster_summary()
function returns a regular Python dictionary that can be directly assigned to a namespace in the run. This way, you can organize your run in such a way that all the summary metadata – like visualizations and the pickled model – are under a common path.
- The script is ready to be executed with additional metadata logging. To view the run in Neptune, click the Neptune app link in the console output.
This run has one extra path – lgbm_summary
– with the following metadata organization:
lgbm_summary
|—— pickled_model
|—— trees_as_dataframe
|—— visualizations
|—— confusion_matrix
|—— trees
|—— feature_importances
|—— gain
|—— split
The lgbm_summary
namespace contains the following:
Name | Description |
---|---|
pickled_model |
Pickled model (booster). |
trees_as_dataframe |
Trees represented as a DataFrame. Learn more in the LightGBM docs. |
confusion_matrix |
Confusion matrix for test data logged as image. |
trees |
Selected trees visualized as graphs. |
gain |
Model's feature importances (total gains of splits that use the feature). |
split |
Model's feature importances (number of times the feature is used in a model). |
More options#
Using Neptune callback with CV function#
You can use NeptuneCallback
in the lightgbm.cv function.
Pass the Neptune callback to the callbacks
argument of lgb.cv()
:
from neptune.integrations.lightgbm import NeptuneCallback
# Create run
run = neptune.init_run()
# Create neptune callback
neptune_callback = NeptuneCallback(run=run)
# Prepare data, params, etc.
...
# Pass the callback to the CV function
gbm_cv = lgb.cv(params, lgb_train, callbacks=[neptune_callback])
# Stop run
run.stop()
import lightgbm as lgb
import neptune
from neptune.integrations.lightgbm import NeptuneCallback
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
# Create run
run = neptune.init_run(
api_token=neptune.ANONYMOUS_API_TOKEN, # (1)!
project="common/lightgbm-integration", # (2)!
name="cv-cls", # optional
tags=["lgbm-integration", "cv", "cls"], # optional
)
# Create neptune callback
neptune_callback = NeptuneCallback(run=run)
# Prepare data
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=123,
)
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
# Define parameters
params = {
"boosting_type": "gbdt",
"objective": "multiclass",
"num_class": 10,
"metric": ["multi_logloss", "multi_error"],
"num_leaves": 21,
"learning_rate": 0.05,
"feature_fraction": 0.9,
"bagging_fraction": 0.8,
"bagging_freq": 5,
"max_depth": 12,
}
# Run CV
gbm_cv = lgb.cv(
params,
lgb_train,
num_boost_round=200,
nfold=7,
callbacks=[neptune_callback],
)
# Stop run
run.stop()
- The
api_token
argument is included to enable anonymous logging. Once you register, you should leave the token out of your script and instead save it as an environment variable. - Projects in the
common
workspace are public and can be used for testing. To log to your own workspace, pass the full name of your Neptune project:workspace-name/project-name
. For example,"ml-team/classification"
. To copy it, navigate to the project settings → Properties.
Working with scikit-learn API#
You can use NeptuneCallback
and create_booster_summary()
in the scikit-learn API of LightGBM:
from neptune.integrations.lightgbm import (
NeptuneCallback, create_booster_summary
)
# Create run
run = neptune.init_run()
# Create neptune callback
neptune_callback = NeptuneCallback(run=run)
# Prepare data, params, and create instance of the classifier object
...
gbm = lgb.LGBMClassifier(**params)
# Fit model and log metadata
gbm.fit(
X_train,
y_train,
callbacks=[neptune_callback],
)
# Compute test predictions
y_pred = ...
# Log summary metadata to the same run under the "lgbm_summary" namespace
run["lgbm_summary"] = create_booster_summary(
booster=gbm,
log_trees=True,
list_trees=[0, 1, 2, 3, 4],
log_confusion_matrix=True,
y_pred=y_pred,
y_true=y_test,
)
# Stop run
run.stop()
import lightgbm as lgb
import neptune
from neptune.integrations.lightgbm import (
NeptuneCallback, create_booster_summary
)
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
# Create run
run = neptune.init_run(
api_token=neptune.ANONYMOUS_API_TOKEN, # (1)!
project="common/lightgbm-integration", # (2)!
name="sklearn-api-cls", # optional
tags=["lgbm-integration", "sklearn-api", "cls"], # optional
)
# Create neptune callback
neptune_callback = NeptuneCallback(run=run)
# Prepare data
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=123,
)
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
# Define parameters
params = {
"boosting_type": "gbdt",
"objective": "multiclass",
"num_class": 10,
"num_leaves": 21,
"learning_rate": 0.05,
"feature_fraction": 0.9,
"bagging_fraction": 0.8,
"bagging_freq": 5,
"max_depth": 12,
"n_estimators": 207,
}
# Create instance of the classifier object
gbm = lgb.LGBMClassifier(**params)
# Fit model and log metadata
gbm.fit(
X_train,
y_train,
eval_set=[(X_train, y_train), (X_test, y_test)],
eval_names=["training", "validation"],
eval_metric=["multi_logloss", "multi_error"],
callbacks=[neptune_callback],
)
y_pred = gbm.predict(X_test)
# Log summary metadata to the same run under the "lgbm_summary" namespace
run["gbm_summary"] = create_booster_summary(
booster=gbm,
log_trees=True,
list_trees=[0, 1, 2, 3, 4],
log_confusion_matrix=True,
y_pred=y_pred,
y_true=y_test,
)
# Stop run
run.stop()
- The
api_token
argument is included to enable anonymous logging. Once you register, you should leave the token out of your script and instead save it as an environment variable. - Projects in the
common
workspace are public and can be used for testing. To log to your own workspace, pass the full name of your Neptune project:workspace-name/project-name
. For example,"ml-team/classification"
. To copy it, navigate to the project settings → Properties.
Related
- API reference ≫ LightGBM