LightGBM
Learn how to log LightGBM metadata to Neptune

What will you get with this integration?

LightGBM is a gradient boosting framework that uses tree-based learning algorithms. Neptune + LightGBM integration, lets you:
    1.
    automatically log many types of metadata during training,
    2.
    log model summary after training.

Automatically log metadata during training

What is logged?
    training and validation metrics,
    parameters,
    feature names, num_features, and num_rows for the train set,
    hardware consumption (CPU, GPU, memory),
    stdout and stderr logs,
    training code and git commit information.
Example dashboard with train-valid metrics and selected parameters

Log model summary after training

You can also log trained LightGBM booster summary to Neptune that can have:
    pickled model,
    feature importance chart (gain and split),
    visualized trees,
    trees saved as DataFrame,
    confusion matrix (only for classification problems).
Example dashboard with model summary

Where to start?

To get started with this integration, follow the quickstart below (recommended). If you are an experienced LightGBM user you can check TL;DR section that gives fast-track information on how the integration works.
If you want to try it out now you can either:

Quickstart

This quickstart will show you how to:
    install required libraries,
    log metadata during training (metrics, parameters, etc.),
    log booster summary (visualizations, confusion matrix, pickled model, etc.) after training,
    check results in the Neptune app.
At the end of this quickstart, you will be able to add Neptune to your LightGBM scripts and use it in your experimentation.

Install requirements

Before you start, make sure that:

Install neptune-client, lightgbm, and neptune-lightgbm

Depending on your operating system open a terminal or CMD and run this command. All required libraries are available via pip and conda:
pip
conda
1
pip install neptune-client lightgbm neptune-lightgbm
Copied!
1
conda install -c conda-forge neptune-client lightgbm neptune-lightgbm
Copied!
For more help see installing neptune-client.
This integration is tested with lightgbm==3.2.1, neptune-client==0.9.16, and neptune-lightgbm==0.9.10.

Install psutil (optional)

If you want to have hardware monitoring logged (recommended) you should additionally install psutil.
pip
conda
1
pip install psutil
Copied!
1
conda install psutil
Copied!

Install graphviz (optional)

If you want to log visualized trees after training (recommended), you need to install graphviz.
The below installation is only for the pure Python interface to the graphviz software. You need to install graphviz separately. Check graphviz docs for installation help.
pip
conda
1
pip install graphviz
Copied!
1
conda install -c conda-forge python-graphviz
Copied!

Log metadata during training

To start logging metadata (metrics, parameters, etc.) during training you need to use NeptuneCallback.
core code
full script
1
from neptune.new.integrations.lightgbm import NeptuneCallback
2
3
# Create run
4
my_run = neptune.init(project="my_workspace/my_project")
5
6
# Create neptune callback
7
neptune_callback = NeptuneCallback(run=my_run)
8
9
# Prepare data, params, etc.
10
...
11
12
# Pass the callback to the train function and train the model
13
gbm = lgb.train(
14
params,
15
lgb_train,
16
callbacks=[neptune_callback],
17
)
Copied!
1
import lightgbm as lgb
2
import neptune.new as neptune
3
from neptune.new.integrations.lightgbm import NeptuneCallback
4
from sklearn.datasets import load_digits
5
from sklearn.model_selection import train_test_split
6
7
# Create run
8
run = neptune.init(
9
project="common/lightgbm-integration",
10
api_token="ANONYMOUS",
11
name="train-cls",
12
tags=["lgbm-integration", "train", "cls"]
13
)
14
15
# Create neptune callback
16
neptune_callback = NeptuneCallback(run=run)
17
18
# Prepare data
19
X, y = load_digits(return_X_y=True)
20
X_train, X_test, y_train, y_test = train_test_split(
21
X,
22
y,
23
test_size=0.2,
24
random_state=123
25
)
26
lgb_train = lgb.Dataset(X_train, y_train)
27
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
28
29
# Define parameters
30
params = {
31
"boosting_type": "gbdt",
32
"objective": "multiclass",
33
"num_class": 10,
34
"metric": ["multi_logloss", "multi_error"],
35
"num_leaves": 21,
36
"learning_rate": 0.05,
37
"feature_fraction": 0.9,
38
"bagging_fraction": 0.8,
39
"bagging_freq": 5,
40
"max_depth": 12,
41
}
42
43
# Train the model
44
gbm = lgb.train(
45
params,
46
lgb_train,
47
num_boost_round=200,
48
valid_sets=[lgb_train, lgb_eval],
49
valid_names=["training", "validation"],
50
callbacks=[neptune_callback],
51
)
52
Copied!
Read docstrings of the NeptuneCallback to learn more about parameters.

In the snippet above

    import NeptuneCallback that you will use to handle metadata logging,
    Create a new run in Neptune,
    Pass run object to the NeptuneCallback,
    Pass the created neptune_callback to the train function.
At this point, your script is ready to use Neptune as a logger.
Now, you can run your script and have metadata logged to Neptune for further inspection, comparison, and sharing:
1
python main.py
Copied!
In Neptune app it will look similar to this:
Logged metadata include parameters and train/valid metrics.

The run above contains

Name
Description
feature names
Names of features in the train set.
monitoring
Hardware monitoring charts, and stdout, stderr logs.
params
LightGBM model parameters.
source_code
Python sources associated with this run. Learn more: here.
sys
Basic run metadata, like creation time, tags, run owner, etc. Learn more here.
train_set
num_features and num_rows in the train set.
training
Training metrics
validation
Validation metrics

What next?

You can run the example presented above by yourself or see it in Neptune:

Log booster summary after training

To log additional metadata that describes the trained model you can use create_booster_summary() method.
You can log a summary to the new run, or to the same run that you used for logging model training. The second option can be very useful because you have all the information in a single run.
In the snippet below you will train the model and log summary information after training:
core code
full script
1
from neptune.new.integrations.lightgbm import create_booster_summary
2
3
# Create run
4
my_run = neptune.init(project="my_workspace/my_project")
5
6
# Prepare data, params and train the model
7
...
8
gbm = lgb.train(params, lgb_train)
9
10
# Compute test predictions
11
y_pred = ...
12
13
# Log summary metadata under the "lgbm_summary" namespace
14
my_run["lgbm_summary"] = create_booster_summary(
15
booster=gbm,
16
log_trees=True,
17
list_trees=[0, 1, 2, 3, 4],
18
log_confusion_matrix=True,
19
y_pred=y_pred,
20
y_true=y_test
21
)
Copied!
1
import lightgbm as lgb
2
import neptune.new as neptune
3
import numpy as np
4
from neptune.new.integrations.lightgbm import NeptuneCallback,\
5
create_booster_summary
6
from sklearn.datasets import load_digits
7
from sklearn.model_selection import train_test_split
8
9
# Create run
10
run = neptune.init(
11
project="common/lightgbm-integration",
12
api_token="ANONYMOUS",
13
name="train-cls",
14
tags=["lgbm-integration", "train", "cls"]
15
)
16
17
# Create neptune callback
18
neptune_callback = NeptuneCallback(run=run)
19
20
# Prepare data
21
X, y = load_digits(return_X_y=True)
22
X_train, X_test, y_train, y_test = train_test_split(
23
X,
24
y,
25
test_size=0.2,
26
random_state=123
27
)
28
lgb_train = lgb.Dataset(X_train, y_train)
29
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
30
31
# Define parameters
32
params = {
33
"boosting_type": "gbdt",
34
"objective": "multiclass",
35
"num_class": 10,
36
"metric": ["multi_logloss", "multi_error"],
37
"num_leaves": 21,
38
"learning_rate": 0.05,
39
"feature_fraction": 0.9,
40
"bagging_fraction": 0.8,
41
"bagging_freq": 5,
42
"max_depth": 12,
43
}
44
45
# Train the model
46
gbm = lgb.train(
47
params,
48
lgb_train,
49
num_boost_round=200,
50
valid_sets=[lgb_train, lgb_eval],
51
valid_names=["training", "validation"],
52
callbacks=[neptune_callback],
53
)
54
55
y_pred = np.argmax(gbm.predict(X_test), axis=1)
56
57
# Log summary metadata to the same run under the "lgbm_summary" namespace
58
run["lgbm_summary"] = create_booster_summary(
59
booster=gbm,
60
log_trees=True,
61
list_trees=[0, 1, 2, 3, 4],
62
log_confusion_matrix=True,
63
y_pred=y_pred,
64
y_true=y_test
65
)
66
Copied!
Read the docstrings of create_booster_summary to learn more about parameters.

About the snippet above

create_booster_summary() returns regular Python dictionary that can be directly assigned to the run's namespace. In this way, you can organize your run in such a way that all the summary metadata, like visualizations, and pickled model are under the common path.
Run script with additional metadata logging:
1
python main.py
Copied!
It will look like this:
Logged metadata including training summary under namespace "lgbm_summary".

More about the run above

This run has one extra path lgbm_summary, with the following metadata organization:
1
lgbm_summary
2
|—— pickled_model
3
|—— trees_as_dataframe
4
|—— visualizations
5
|—— confusion_matrix
6
|—— trees
7
|—— feature_importances
8
|—— gain
9
|—— split
Copied!
Name
Description
pickled_model
Pickled model (booster).
trees_as_dataframe
Trees represented as a DataFrame. Learn more: here.
confusion_matrix
Confusion matrix for test data logged as image.
trees
Selected trees visualized as graphs.
gain
Model's feature importances (total gains of splits that use the feature.).
split
Model's feature importances (numbers of times the feature is used in a model).
You can use both NeptuneCallback and create_booster_summary() in the same script and log all metadata to the same run in Neptune.

Stop Logging

Once you are done logging, you should stop tracking the run using the stop() method. This is needed only while logging from a notebook environment. While logging through a script, Neptune automatically stops tracking once the script has completed execution.
1
run.stop()
Copied!

What next?

You can run the example presented above by yourself or see it in Neptune:

TL;DR for the LightGBM users

This section is for the LightGBM users who are familiar with Neptune and LightGBM callbacks. If you haven't worked with Neptune or LightGBM callbacks before, jump to the quickstart.

Install requirements

pip
conda
1
pip install -q neptune-client lightgbm neptune-lightgbm psutil
Copied!
1
conda install -c conda-forge neptune-client lightgbm neptune-lightgbm psutil
Copied!
This integration is tested with lightgbm==3.2.1, neptune-client==0.9.16, and neptune-lightgbm==0.9.12.

Install graphviz (optional)

If you want to log visualized trees after training (recommended), you need to install graphviz.
Below installation is only for the pure Python interface to the graphviz software. You need to install graphviz separately. Check graphviz docs for installation help.
pip
conda
1
pip install graphviz
Copied!
1
conda install -c conda-forge python-graphviz
Copied!

Log metadata during and after training

For metadata logging during training use NeptuneCallback and for logging additional model summary after training use create_booster_summary() method:
core code
full script
1
from neptune.new.integrations.lightgbm import NeptuneCallback,\
2
create_booster_summary
3
4
# Create run
5
my_run = neptune.init(project="my_workspace/my_project")
6
7
# Create neptune callback
8
neptune_callback = NeptuneCallback(run=my_run)
9
10
# Prepare data, params, etc.
11
...
12
13
# Pass the callback to the train function and train the model
14
gbm = lgb.train(
15
params,
16
lgb_train,
17
callbacks=[neptune_callback],
18
)
19
20
# Compute test predictions
21
y_pred = ...
22
23
# Log summary metadata under the "lgbm_summary" namespace
24
my_run["lgbm_summary"] = create_booster_summary(
25
booster=gbm,
26
log_trees=True,
27
list_trees=[0, 1, 2, 3, 4],
28
log_confusion_matrix=True,
29
y_pred=y_pred,
30
y_true=y_test
31
)
Copied!
1
import lightgbm as lgb
2
import neptune.new as neptune
3
import numpy as np
4
from neptune.new.integrations.lightgbm import NeptuneCallback,\
5
create_booster_summary
6
from sklearn.datasets import load_digits
7
from sklearn.model_selection import train_test_split
8
9
# Create run
10
run = neptune.init(
11
project="common/lightgbm-integration",
12
api_token="ANONYMOUS",
13
name="train-cls",
14
tags=["lgbm-integration", "train", "cls"]
15
)
16
17
# Create neptune callback
18
neptune_callback = NeptuneCallback(run=run)
19
20
# Prepare data
21
X, y = load_digits(return_X_y=True)
22
X_train, X_test, y_train, y_test = train_test_split(
23
X,
24
y,
25
test_size=0.2,
26
random_state=123
27
)
28
lgb_train = lgb.Dataset(X_train, y_train)
29
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
30
31
# Define parameters
32
params = {
33
"boosting_type": "gbdt",
34
"objective": "multiclass",
35
"num_class": 10,
36
"metric": ["multi_logloss", "multi_error"],
37
"num_leaves": 21,
38
"learning_rate": 0.05,
39
"feature_fraction": 0.9,
40
"bagging_fraction": 0.8,
41
"bagging_freq": 5,
42
"max_depth": 12,
43
}
44
45
# Train the model
46
gbm = lgb.train(
47
params,
48
lgb_train,
49
num_boost_round=200,
50
valid_sets=[lgb_train, lgb_eval],
51
valid_names=["training", "validation"],
52
callbacks=[neptune_callback],
53
)
54
55
y_pred = np.argmax(gbm.predict(X_test), axis=1)
56
57
# Log summary metadata to the same run under the "lgbm_summary" namespace
58
run["lgbm_summary"] = create_booster_summary(
59
booster=gbm,
60
log_trees=True,
61
list_trees=[0, 1, 2, 3, 4],
62
log_confusion_matrix=True,
63
y_pred=y_pred,
64
y_true=y_test
65
)
Copied!
Read the docstrings of NeptuneCallback and create_booster_summary to learn more about parameters.

In the snippet above you

    use NeptuneCallback to log training metadata such as parameters and metrics,
    use create_booster_summary() to log additional metadata (visualizations, pickled model) after training is done.
Run script to log both training and booster summary metadata:
1
python main.py
Copied!
It will look like this:
Example dashboard with booster summary

Stop Logging

Once you are done logging, you should stop tracking the run using the stop() method. This is needed only while logging from a notebook environment. While logging through a script, Neptune automatically stops tracking once the script has completed execution.
1
run.stop()
Copied!

What next?

You can run the example presented above by yourself or see it in Neptune:

More options

CV

You can use NeptuneCallback in the LightGBM.cv function:
core code
full script
1
from neptune.new.integrations.lightgbm import NeptuneCallback
2
3
# Create run
4
my_run = neptune.init(project="my_workspace/my_project")
5
6
# Create neptune callback
7
neptune_callback = NeptuneCallback(run=my_run)
8
9
# Prepare data, params, etc.
10
...
11
12
# Pass the callback to the cv function
13
gbm_cv = lgb.cv(
14
params,
15
lgb_train,
16
callbacks=[neptune_callback],
17
)
Copied!
1
import lightgbm as lgb
2
import neptune.new as neptune
3
from neptune.new.integrations.lightgbm import NeptuneCallback
4
from sklearn.datasets import load_digits
5
from sklearn.model_selection import train_test_split
6
7
# Create run
8
run = neptune.init(
9
project="common/lightgbm-integration",
10
api_token="ANONYMOUS",
11
name="cv-cls",
12
tags=["lgbm-integration", "cv", "cls"]
13
)
14
15
# Create neptune callback
16
neptune_callback = NeptuneCallback(run=run)
17
18
# Prepare data
19
X, y = load_digits(return_X_y=True)
20
X_train, X_test, y_train, y_test = train_test_split(
21
X,
22
y,
23
test_size=0.2,
24
random_state=123
25
)
26
lgb_train = lgb.Dataset(X_train, y_train)
27
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
28
29
# Define parameters
30
params = {
31
"boosting_type": "gbdt",
32
"objective": "multiclass",
33
"num_class": 10,
34
"metric": ["multi_logloss", "multi_error"],
35
"num_leaves": 21,
36
"learning_rate": 0.05,
37
"feature_fraction": 0.9,
38
"bagging_fraction": 0.8,
39
"bagging_freq": 5,
40
"max_depth": 12,
41
}
42
43
# Run CV
44
gbm_cv = lgb.cv(
45
params,
46
lgb_train,
47
num_boost_round=200,
48
nfold=7,
49
callbacks=[neptune_callback],
50
)
Copied!
Read the docstrings of NeptuneCallback to learn more about parameters.

In the snippet above

    import NeptuneCallback that you will use to handle metadata logging,
    Create a new run in Neptune,
    Pass run object to the NeptuneCallback,
    Pass created neptune_callback to the lightgbm.cv function.
At this point, your script is ready to use Neptune as a logger.
Now, you can run your script and have metadata logged to Neptune for further inspection, comparison, and sharing:
1
python main.py
Copied!
In Neptune it will look similar to this:
Example dashboard with cross-validation results.

What next?

You can run the example presented above by yourself or see it in Neptune:

Scikit-learn API

You can use NeptuneCallback and create_booster_summary() when working with the scikit-learn API of the LightGBM.
core code
full script
1
from neptune.new.integrations.lightgbm import NeptuneCallback,\
2
create_booster_summary
3
4
# Create run
5
my_run = neptune.init(project="my_workspace/my_project")
6
7
# Create neptune callback
8
neptune_callback = NeptuneCallback(run=my_run)
9
10
# Prepare data, params, and create instance of the classifier object
11
...
12
gbm = lgb.LGBMClassifier(**params)
13
14
# Fit model and log metadata
15
gbm.fit(
16
X_train,
17
y_train,
18
callbacks=[neptune_callback],
19
)
20
21
# Compute test predictions
22
y_pred = ...
23
24
# Log summary metadata to the same run under the "lgbm_summary" namespace
25
run["lgbm_summary"] = create_booster_summary(
26
booster=gbm,
27
log_trees=True,
28
list_trees=[0, 1, 2, 3, 4],
29
log_confusion_matrix=True,
30
y_pred=y_pred,
31
y_true=y_test
32
)
Copied!
1
import lightgbm as lgb
2
import neptune.new as neptune
3
from neptune.new.integrations.lightgbm import NeptuneCallback,\
4
create_booster_summary
5
from sklearn.datasets import load_digits
6
from sklearn.model_selection import train_test_split
7
8
# Create run
9
run = neptune.init(
10
project="common/lightgbm-integration",
11
api_token="ANONYMOUS",
12
name="sklearn-api-cls",
13
tags=["lgbm-integration", "sklearn-api", "cls"]
14
)
15
16
# Create neptune callback
17
neptune_callback = NeptuneCallback(run=run)
18
19
# Prepare data
20
X, y = load_digits(return_X_y=True)
21
X_train, X_test, y_train, y_test = train_test_split(
22
X,
23
y,
24
test_size=0.2,
25
random_state=123
26
)
27
lgb_train = lgb.Dataset(X_train, y_train)
28
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
29
30
# Define parameters
31
params = {
32
"boosting_type": "gbdt",
33
"objective": "multiclass",
34
"num_class": 10,
35
"num_leaves": 21,
36
"learning_rate": 0.05,
37
"feature_fraction": 0.9,
38
"bagging_fraction": 0.8,
39
"bagging_freq": 5,
40
"max_depth": 12,
41
"n_estimators": 207,
42
}
43
44
# Create instance of the classifier object
45
gbm = lgb.LGBMClassifier(**params)
46
47
# Fit model and log metadata
48
gbm.fit(
49
X_train,
50
y_train,
51
eval_set=[(X_train, y_train), (X_test, y_test)],
52
eval_names=["training", "validation"],
53
eval_metric=["multi_logloss", "multi_error"],
54
callbacks=[neptune_callback],
55
)
56
57
y_pred = gbm.predict(X_test)
58
59
# Log summary metadata to the same run under the "lgbm_summary" namespace
60
run["gbm_summary"] = create_booster_summary(
61
booster=gbm,
62
log_trees=True,
63
list_trees=[0, 1, 2, 3, 4],
64
log_confusion_matrix=True,
65
y_pred=y_pred,
66
y_true=y_test
67
)
Copied!
Read the docstrings of NeptuneCallback and create_booster_summary to learn more about parameters.

In the snippet above

    import NeptuneCallback that you will use to handle metadata logging,
    Create a new run in Neptune,
    Pass run object to the NeptuneCallback,
    Pass created neptune_callback to the fit() function (sklearn-api).
At this point, your script is ready to use Neptune as a logger.
Now, you can run your script and have metadata logged to Neptune for further inspection, comparison, and sharing:
1
python main.py
Copied!
In Neptune, it will look similar to the screen below. Note that it is the same as in the train() function:
Example run, where sklearn API was used.

What next?

You can run the example presented above by yourself or see it in Neptune:

Resume run

You can resume a run that you created before and continue logging to it. It comes useful when you train a LightGBM model in multiple training sessions. Here is how to do it:

Log other metadata

If you have other types of metadata that are not covered in this integration, you can still log them by using neptune-client. When you create the run, you have the my_run handle:
1
# Create new Neptune run
2
my_run = neptune.init(project="my_workspace/my_project")
Copied!
You can use my_run object to log metadata. Here is more info about it:

How to ask for help?

Please visit the Getting help page. Everything regarding support is there:

Other pages you may like

You may also find the following pages useful:
Last modified 1mo ago