LightGBM integration guide#
LightGBM is a gradient-boosting framework that uses tree-based learning algorithms. With the Neptune-LightGBM integration, the following metadata is logged automatically:
- Training and validation metrics
- Parameters
- Feature names,
num_features
, andnum_rows
for the train set - Hardware consumption metrics
- stdout and stderr streams
- Training code and Git commit information
You can also log the trained LightGBM booster summary, which can contain:
- The pickled model
- The feature importance chart (gain and split)
- Visualized trees
- Trees saved as DataFrames
- Confusion matrix (for classification problems)
See example in Neptune  Code examples 
Before you start#
- Sign up at neptune.ai/register.
- Create a project for storing your metadata.
- Have LightGBM installed.
Installing the integration#
To use your preinstalled version of Neptune together with the integration:
To install both Neptune and the integration:
Passing your Neptune credentials
Once you've registered and created a project, set your Neptune API token and full project name to the NEPTUNE_API_TOKEN
and NEPTUNE_PROJECT
environment variables, respectively.
To find your API token: In the bottom-left corner of the Neptune app, expand the user menu and select Get my API token.
Your full project name has the form workspace-name/project-name
. You can copy it from the project settings: Click the
menu in the top-right →
Details & privacy.
On Windows, navigate to Settings → Edit the system environment variables, or enter the following in Command Prompt: setx SOME_NEPTUNE_VARIABLE 'some-value'
While it's not recommended especially for the API token, you can also pass your credentials in the code when initializing Neptune.
run = neptune.init_run(
project="ml-team/classification", # your full project name here
api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh...3Kb8", # your API token here
)
For more help, see Set Neptune credentials.
To log visualized trees after training (recommended), additionally install Graphviz:
Note
The above installation is only for the pure Python interface to the Graphviz software. You need to install Graphviz separately.
For installation help, see the Graphviz documentation .
Quickstart#
Tip
This section is for LightGBM users who are familiar with Neptune and LightGBM callbacks.
The integration has two core components:
NeptuneCallback
for logging metadata during training, such as parameters and metrics.create_booster_summary()
for logging additional metadata after training, such as visualizations and the pickled model.
from neptune.integrations.lightgbm import (
NeptuneCallback, create_booster_summary
)
# Create run
run = neptune.init_run() # (1)!
# Create Neptune callback
neptune_callback = NeptuneCallback(run=run)
# Prepare data, params, etc.
...
# Pass the callback to the train function and train the model
gbm = lgb.train(params, lgb_train, callbacks=[neptune_callback])
# Compute test predictions
y_pred = ...
# Log summary metadata under the "lgbm_summary" namespace
run["lgbm_summary"] = create_booster_summary(
booster=gbm,
log_trees=True,
list_trees=[0, 1, 2, 3, 4],
log_confusion_matrix=True,
y_pred=y_pred,
y_true=y_test,
)
# When done logging, stop the run
run.stop()
-
If you haven't set up your credentials, you can log anonymously:
import lightgbm as lgb
import neptune
import numpy as np
from neptune.integrations.lightgbm import (
NeptuneCallback, create_booster_summary
)
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
# Create run
run = neptune.init_run(
project="common/lightgbm-integration",
api_token=neptune.ANONYMOUS_API_TOKEN,
name="train-cls",
tags=["lgbm-integration", "train", "cls"],
)
# Create Neptune callback
neptune_callback = NeptuneCallback(run=run)
# Prepare data
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=123,
)
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
# Define parameters
params = {
"boosting_type": "gbdt",
"objective": "multiclass",
"num_class": 10,
"metric": ["multi_logloss", "multi_error"],
"num_leaves": 21,
"learning_rate": 0.05,
"feature_fraction": 0.9,
"bagging_fraction": 0.8,
"bagging_freq": 5,
"max_depth": 12,
}
# Train the model
gbm = lgb.train(
params,
lgb_train,
num_boost_round=200,
valid_sets=[lgb_train, lgb_eval],
valid_names=["training", "validation"],
callbacks=[neptune_callback],
)
y_pred = np.argmax(gbm.predict(X_test), axis=1)
# Log summary metadata to the same run under the "lgbm_summary" namespace
run["lgbm_summary"] = create_booster_summary(
booster=gbm,
log_trees=True,
list_trees=[0, 1, 2, 3, 4],
log_confusion_matrix=True,
y_pred=y_pred,
y_true=y_test,
)
# When done logging, stop the run
run.stop()
Full walkthrough#
This example walks you through logging metadata as you train your model with LightGBM.
You can log metadata during training with NeptuneCallback
, and after training with the create_booster_summary()
function.
Logging metadata during training#
-
Start a run:
-
If you haven't set up your credentials, you can log anonymously:
-
-
Initialize the Neptune callback:
-
Pass the callback to the
train()
function and train the model: -
To stop the connection to Neptune and sync all data, call the
stop()
method: -
Run your script as you normally would.
To open the run, click the Neptune link that appears in the console output.
Example link: https://app.neptune.ai/common/lightgbm-integration/e/LGBM-85
import lightgbm as lgb
import neptune
from neptune.integrations.lightgbm import NeptuneCallback
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
# Create Neptune run
run = neptune.init_run(
api_token=neptune.ANONYMOUS_API_TOKEN, # (1)!
project="common/lightgbm-integration", # (2)!
name="train-cls", # optional
tags=["lgbm-integration", "train", "cls"], # optional
)
# Create Neptune callback
neptune_callback = NeptuneCallback(run=run)
# Prepare data
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=123,
)
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
# Define parameters
params = {
"boosting_type": "gbdt",
"objective": "multiclass",
"num_class": 10,
"metric": ["multi_logloss", "multi_error"],
"num_leaves": 21,
"learning_rate": 0.05,
"feature_fraction": 0.9,
"bagging_fraction": 0.8,
"bagging_freq": 5,
"max_depth": 12,
}
# Train the model
gbm = lgb.train(
params,
lgb_train,
num_boost_round=200,
valid_sets=[lgb_train, lgb_eval],
valid_names=["training", "validation"],
callbacks=[neptune_callback],
)
run.stop()
-
The
api_token
argument is included to enable anonymous logging.Once you've registered, leave the token out of your script and instead save it as an environment variable.
-
Projects in the
common
workspace are public and can be used for testing.To log to your own workspace, pass the full name of your Neptune project:
workspace-name/project-name
. For example,project="ml-team/classification"
.You can copy the name from the project details ( → Details & privacy).
Exploring results in Neptune#
In the run view, you can see the logged metadata organized into folder-like namespaces.
Name | Description |
---|---|
feature_names |
Names of features in the train set. |
monitoring |
Hardware monitoring charts, stdout, and stderr. |
params |
LightGBM model parameters. |
source_code |
Python sources associated with this run. |
sys |
Basic run metadata, like creation time, tags, description, and owner. |
train_set |
num_features and num_rows in the train set. |
training |
Training metrics. |
validation |
Validation metrics. |
Logging booster summary after training#
To log additional metadata that describes the trained model, you can use the create_booster_summary()
function.
To have all the data in the same place, you can use the Neptune callback and create the booster summary in the same script. This way, you'll log all metadata to the same run in Neptune.
Related
You can also resume logging to a previously created run, by passing the ID of the run to the initialization function: neptune.init_run(with_id="CLS-13")
.
To learn more, see Resume a run.
In the snippet below, we train the model and log summary information after training:
from neptune.integrations.lightgbm import create_booster_summary
# Create new run
run = neptune.init_run()
# Prepare data and parameters
...
# Train the model
gbm = lgb.train(params, lgb_train)
# Compute test predictions
y_pred = ...
# Log summary metadata under the "lgbm_summary" namespace
run["lgbm_summary"] = create_booster_summary(
booster=gbm,
log_trees=True,
list_trees=[0, 1, 2, 3, 4],
log_confusion_matrix=True,
y_pred=y_pred,
y_true=y_test,
)
run.stop()
import lightgbm as lgb
import neptune
import numpy as np
from neptune.integrations.lightgbm import (
NeptuneCallback, create_booster_summary)
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
# Create run
run = neptune.init_run(
project="common/lightgbm-integration",
api_token=neptune.ANONYMOUS_API_TOKEN,
name="train-cls",
tags=["lgbm-integration", "train", "cls"],
)
# Create neptune callback
neptune_callback = NeptuneCallback(run=run)
# Prepare data
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=123,
)
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
# Define parameters
params = {
"boosting_type": "gbdt",
"objective": "multiclass",
"num_class": 10,
"metric": ["multi_logloss", "multi_error"],
"num_leaves": 21,
"learning_rate": 0.05,
"feature_fraction": 0.9,
"bagging_fraction": 0.8,
"bagging_freq": 5,
"max_depth": 12,
}
# Train the model
gbm = lgb.train(
params,
lgb_train,
num_boost_round=200,
valid_sets=[lgb_train, lgb_eval],
valid_names=["training", "validation"],
callbacks=[neptune_callback],
)
y_pred = np.argmax(gbm.predict(X_test), axis=1)
# Log summary metadata to the same run under the "lgbm_summary" namespace
run["lgbm_summary"] = create_booster_summary(
booster=gbm,
log_trees=True,
list_trees=[0, 1, 2, 3, 4],
log_confusion_matrix=True,
y_pred=y_pred,
y_true=y_test,
)
run.stop()
The create_booster_summary()
function returns a regular Python dictionary that can be directly assigned to a namespace in the run. This way, you can organize your run in such a way that all the summary metadata – like visualizations and the pickled model – are under a common path.
- The script is ready to be executed with additional metadata logging. To view the run in Neptune, click the Neptune app link in the console output.
This run has one extra path – lgbm_summary
– with the following metadata organization:
lgbm_summary
|—— pickled_model
|—— trees_as_dataframe
|—— visualizations
|—— confusion_matrix
|—— trees
|—— feature_importances
|—— gain
|—— split
The lgbm_summary
namespace contains the following:
Name | Description |
---|---|
pickled_model |
Pickled model (booster). |
trees_as_dataframe |
Trees represented as a DataFrame. Learn more in the LightGBM docs. |
confusion_matrix |
Confusion matrix for test data logged as image. |
trees |
Selected trees visualized as graphs. |
gain |
Model's feature importances (total gains of splits that use the feature). |
split |
Model's feature importances (number of times the feature is used in a model). |
More options#
Using Neptune callback with CV function#
You can use NeptuneCallback
in the lightgbm.cv function.
Pass the Neptune callback to the callbacks
argument of lgb.cv()
:
from neptune.integrations.lightgbm import NeptuneCallback
# Create run
run = neptune.init_run()
# Create neptune callback
neptune_callback = NeptuneCallback(run=run)
# Prepare data, params, etc.
...
# Pass the callback to the CV function
gbm_cv = lgb.cv(params, lgb_train, callbacks=[neptune_callback])
# Stop run
run.stop()
import lightgbm as lgb
import neptune
from neptune.integrations.lightgbm import NeptuneCallback
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
# Create run
run = neptune.init_run(
api_token=neptune.ANONYMOUS_API_TOKEN, # (1)!
project="common/lightgbm-integration", # (2)!
name="cv-cls", # optional
tags=["lgbm-integration", "cv", "cls"], # optional
)
# Create neptune callback
neptune_callback = NeptuneCallback(run=run)
# Prepare data
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=123,
)
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
# Define parameters
params = {
"boosting_type": "gbdt",
"objective": "multiclass",
"num_class": 10,
"metric": ["multi_logloss", "multi_error"],
"num_leaves": 21,
"learning_rate": 0.05,
"feature_fraction": 0.9,
"bagging_fraction": 0.8,
"bagging_freq": 5,
"max_depth": 12,
}
# Run CV
gbm_cv = lgb.cv(
params,
lgb_train,
num_boost_round=200,
nfold=7,
callbacks=[neptune_callback],
)
# Stop run
run.stop()
-
The
api_token
argument is included to enable anonymous logging.Once you've registered, leave the token out of your script and instead save it as an environment variable.
-
Projects in the
common
workspace are public and can be used for testing.To log to your own workspace, pass the full name of your Neptune project:
workspace-name/project-name
. For example,project="ml-team/classification"
.You can copy the name from the project details ( → Details & privacy).
Working with scikit-learn API#
You can use NeptuneCallback
and create_booster_summary()
in the scikit-learn API of LightGBM:
from neptune.integrations.lightgbm import (
NeptuneCallback, create_booster_summary
)
# Create run
run = neptune.init_run()
# Create neptune callback
neptune_callback = NeptuneCallback(run=run)
# Prepare data, params, and create instance of the classifier object
...
gbm = lgb.LGBMClassifier(**params)
# Fit model and log metadata
gbm.fit(
X_train,
y_train,
callbacks=[neptune_callback],
)
# Compute test predictions
y_pred = ...
# Log summary metadata to the same run under the "lgbm_summary" namespace
run["lgbm_summary"] = create_booster_summary(
booster=gbm,
log_trees=True,
list_trees=[0, 1, 2, 3, 4],
log_confusion_matrix=True,
y_pred=y_pred,
y_true=y_test,
)
# Stop run
run.stop()
import lightgbm as lgb
import neptune
from neptune.integrations.lightgbm import (
NeptuneCallback, create_booster_summary
)
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
# Create run
run = neptune.init_run(
api_token=neptune.ANONYMOUS_API_TOKEN, # (1)!
project="common/lightgbm-integration", # (2)!
name="sklearn-api-cls", # optional
tags=["lgbm-integration", "sklearn-api", "cls"], # optional
)
# Create neptune callback
neptune_callback = NeptuneCallback(run=run)
# Prepare data
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=123,
)
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
# Define parameters
params = {
"boosting_type": "gbdt",
"objective": "multiclass",
"num_class": 10,
"num_leaves": 21,
"learning_rate": 0.05,
"feature_fraction": 0.9,
"bagging_fraction": 0.8,
"bagging_freq": 5,
"max_depth": 12,
"n_estimators": 207,
}
# Create instance of the classifier object
gbm = lgb.LGBMClassifier(**params)
# Fit model and log metadata
gbm.fit(
X_train,
y_train,
eval_set=[(X_train, y_train), (X_test, y_test)],
eval_names=["training", "validation"],
eval_metric=["multi_logloss", "multi_error"],
callbacks=[neptune_callback],
)
y_pred = gbm.predict(X_test)
# Log summary metadata to the same run under the "lgbm_summary" namespace
run["gbm_summary"] = create_booster_summary(
booster=gbm,
log_trees=True,
list_trees=[0, 1, 2, 3, 4],
log_confusion_matrix=True,
y_pred=y_pred,
y_true=y_test,
)
# Stop run
run.stop()
-
The
api_token
argument is included to enable anonymous logging.Once you've registered, leave the token out of your script and instead save it as an environment variable.
-
Projects in the
common
workspace are public and can be used for testing.To log to your own workspace, pass the full name of your Neptune project:
workspace-name/project-name
. For example,project="ml-team/classification"
.You can copy the name from the project details ( → Details & privacy).
Related
- LightGBM integration API reference
- neptune-lightgbm repo on GitHub
- LightGBM on GitHub