PyTorch Lightning
Learn how to log PyTorch Lightning metadata to Neptune.

What will you get with this integration?

PyTorch Lightning is a lightweight PyTorch wrapper for high-performance AI research. With Neptune integration you can automatically:
  • monitor model training live,
  • log training, validation, and testing metrics, and visualize them in the Neptune UI,
  • log hyperparameters,
  • monitor hardware usage,
  • log any additional metrics,
  • log performance charts and images,
  • save model checkpoints,
  • + do whatever you would expect from a modern ML metadata store.

TL;DR for the PyTorch Lightning users

This section is for PyTorch Lightning users who are familiar with loggers, like TensorBoardLogger. If you haven't worked with PyTorch Lightning loggers before, jump to the "Where to start?" section below.
PyTorch Lightning has a unified way of logging metadata, by using Loggers and NeptuneLogger is one of them. So all you need to do to start logging is to create NeptuneLogger and pass it to the Trainer object:

Create NeptuneLogger instance and pass it to the Trainer

1
from pytorch_lightning import Trainer
2
from pytorch_lightning.loggers import NeptuneLogger
3
4
# create NeptuneLogger
5
neptune_logger = NeptuneLogger(
6
api_key="ANONYMOUS", # replace with your own
7
project="common/pytorch-lightning-integration", # "<WORKSPACE/PROJECT>"
8
tags=["training", "resnet"], # optional
9
)
10
11
# pass it to the Trainer
12
trainer = Trainer(max_epochs=10, logger=neptune_logger)
13
14
# run training
15
trainer.fit(my_model, my_dataloader)
Copied!
Few explanations:
  • You need to pass api_token and project to the NeptuneLogger, to inform logger who is logging (api_token) and where to log the metadata (project). There are more parameters to customize logger behavior, check NeptuneLogger docs for more details.
  • Once neptune_logger is created simply pass it to the trainer, like any other PyTorch Lightning logger.

NeptuneLogger is ready

You can run your scripts without additional changes and have all metadata logged in a single place for further analysis, comparison, and sharing in the team. See example run.

Where to start?

To get started with this integration, follow the quickstart below. If you want to try it out now you can either:

Quickstart

This quickstart will show you how to:
  • Install required libraries,
  • Connect NeptuneLogger to your PyTorch Lightning script to enable automatic logging,
  • Analyze logged metadata and compare some runs.
At the end of this quickstart, you will be able to connect NeptuneLogger to your lightning scripts and use it in your experimentation.

Step 1: Install libraries

Before you start, make sure that:

Install neptune-client and pytorch-lightning

Depending on your operating system open a terminal or CMD and run this command. All required libraries are available via pip and conda:
pip
conda
1
pip install neptune-client pytorch-lightning
Copied!
1
conda install -c conda-forge neptune-client pytorch-lightning
Copied!
This integration is tested with pytorch-lightning==1.5.0, and neptune-client==0.11.0

Step 2: Create NeptuneLogger

1
from pytorch_lightning.loggers import NeptuneLogger
2
3
neptune_logger = NeptuneLogger(
4
api_key="ANONYMOUS",
5
project="common/pytorch-lightning-integration",
6
)
Copied!
Few explanations:
  • You need to pass api_token and project to the NeptuneLogger, to inform the logger who is logging (api_token) and where to log the metadata (project). There are more parameters to customize logger behavior, check NeptuneLogger docs for more details.
  • You can further customize the behavior of the NeptuneLogger by using available parameters. Check the logger docs for more details.

Step 3: Pass neptune_logger to Trainer

Pass neptune_logger instance to lightning Trainer to log model training metadata to Neptune:
1
from pytorch_lightning import Trainer
2
3
trainer = Trainer(
4
logger=neptune_logger,
5
max_epochs=250,
6
)
Copied!

Step 4: Run model training

Pass your LightningModule and Dataloader to the trainer.fit() and run the training:
snippet
Full script
1
model = My_LightningModule()
2
train_loader = My_DataLoader()
3
4
trainer.fit(model, train_loader)
Copied!
main.py
1
import os
2
3
import numpy as np
4
import torch
5
from sklearn.metrics import accuracy_score
6
from torch.nn import functional as F
7
from torch.utils.data import DataLoader
8
from torchvision import transforms
9
from torchvision.datasets import MNIST
10
11
from pytorch_lightning import LightningModule, Trainer
12
from pytorch_lightning.loggers import NeptuneLogger
13
14
# define hyper-parameters
15
PARAMS = {
16
"batch_size": 32,
17
"lr": 0.007,
18
"max_epochs": 15,
19
}
20
21
22
# (neptune) define LightningModule with logging (self.log)
23
class MNISTModel(LightningModule):
24
def __init__(self):
25
super().__init__()
26
self.l1 = torch.nn.Linear(28 * 28, 10)
27
28
def forward(self, x):
29
return torch.relu(self.l1(x.view(x.size(0), -1)))
30
31
def training_step(self, batch, batch_idx):
32
x, y = batch
33
y_hat = self(x)
34
loss = F.cross_entropy(y_hat, y)
35
self.log("metrics/batch/loss", loss, prog_bar=False)
36
37
y_true = y.cpu().detach().numpy()
38
y_pred = y_hat.argmax(axis=1).cpu().detach().numpy()
39
acc = accuracy_score(y_true, y_pred)
40
self.log("metrics/batch/acc", acc)
41
42
return {"loss": loss,
43
"y_true": y_true,
44
"y_pred": y_pred}
45
46
def training_epoch_end(self, outputs):
47
loss = np.array([])
48
y_true = np.array([])
49
y_pred = np.array([])
50
for results_dict in outputs:
51
loss = np.append(loss, results_dict["loss"])
52
y_true = np.append(y_true, results_dict["y_true"])
53
y_pred = np.append(y_pred, results_dict["y_pred"])
54
acc = accuracy_score(y_true, y_pred)
55
self.log("metrics/epoch/loss", loss.mean())
56
self.log("metrics/epoch/acc", acc)
57
58
def configure_optimizers(self):
59
return torch.optim.Adam(self.parameters(), lr=PARAMS["lr"])
60
61
62
# init model
63
mnist_model = MNISTModel()
64
65
# init DataLoader from MNIST dataset
66
train_ds = MNIST(
67
os.getcwd(),
68
train=True,
69
download=True,
70
transform=transforms.ToTensor()
71
)
72
train_loader = DataLoader(train_ds, batch_size=PARAMS["batch_size"])
73
74
# (neptune) create NeptuneLogger
75
neptune_logger = NeptuneLogger(
76
api_key="ANONYMOUS",
77
project="common/pytorch-lightning-integration",
78
tags=["simple", "showcase"],
79
log_model_checkpoints=False,
80
)
81
82
# (neptune) initialize a trainer and pass neptune_logger
83
trainer = Trainer(
84
logger=neptune_logger,
85
max_epochs=PARAMS["max_epochs"],
86
)
87
88
# (neptune) log hyper-parameters
89
neptune_logger.log_hyperparams(params=PARAMS)
90
91
# train the model log metadata to the Neptune run
92
trainer.fit(mnist_model, train_loader)
Copied!
Run the script:
1
python main.py
Copied!
You just learned how to connect NeptuneLogger to your lightning scripts and use it in your experimentation.

Explore Results

Go to the link printed to the console to explore training results. The link should be similar to this: https://app.neptune.ai/o/common/org/pytorch-lightning-integration/e/PTL-18

View all metadata section

Metadata logged from the lightning training.
Metadata from the lightning runs are logged under the "training/" namespace. You can change it by modifying the prefix argument of the NeptuneLogger, for example:
1
from pytorch_lightning.loggers import NeptuneLogger
2
3
neptune_logger = NeptuneLogger(
4
project="common/pytorch-lightning-integration",
5
prefix="my_prefix", # custom prefix
6
)
Copied!

Metrics

Metrics are visualized as interactive charts.
  • Metrics are logged to the nested dictionary-like structures (learn more about this concept in the logging metadata page) defined in the LightningModule . You simply useself.log("path/to/metric", value) from the lightning API.
  • In this way, you can customize where the metrics are logged in the run hierarchy and organize metrics and other metadata in a custom way that follows your needs, for example:
1
from pytorch_lightning import LightningModule
2
3
class MNISTModel(LightningModule):
4
def training_step(self, batch, batch_idx):
5
loss = ...
6
self.log("metrics/batch/loss", loss, prog_bar=False)
7
8
acc = ...
9
self.log("metrics/batch/acc", acc)
10
11
def training_epoch_end(self, outputs):
12
loss = ...
13
acc = ...
14
self.log("metrics/epoch/loss", loss)
15
self.log("metrics/epoch/acc", acc)
Copied!

Charts

Charts tab let's you display many metrics at once.
The Charts tab allows you to display all metrics at once.
Custom Dashboard
User-created dashboard that displays metrics and hyper-parameters.
A Custom dashboard is a tool that lets you display many types of metadata at once. It comes useful when you want to analyze the run from some particular perspective. Learn more here:

More options

You can configure NeptuneLogger in many different ways to address custom logging needs. Also, you can use many options when it comes to logging from lightning. Below common use cases are described:

Use the logger methods anywhere in your LightningModule class

With Neptune logger you can use default logging methods:
  • self.log() ,
  • log_metrics(),
  • log_hyperparams().
If you want to log custom metadata (images, csv files, interactive chart, etc.) you can access Neptune run directly using self.logger.experiment attribute.
1
from neptune.new.types import File
2
from pytorch_lightning import LightningModule
3
4
class LitModel(LightningModule):
5
def training_step(self, batch, batch_idx):
6
# log metrics
7
acc = ...
8
self.log("train/loss", loss) # standard log method
9
10
def any_lightning_module_function_or_hook(self):
11
# log images, use neptune-client API
12
img = ...
13
self.logger.experiment["train/misclassified_imgs"].log(File.as_image(img))
14
15
# generic recipe, use neptune-client API
16
metadata = ...
17
self.logger.experiment["your/metadata/structure"].log(metadata)
Copied!
Note that generic recipe: self.logger.experiment["your/metadata/structure"].log(metadata) allows you to log various types of metadata like scores, files, images, interactive visuals, CSVs, etc, under-user defined hierarchical structures ("your/metadata/structure"). Learn more what you can log and display and what logging methods you can use (snippet above presents log which is just one example).

Example

Metrics logged under path: "training/val/acc"
The example above is generated like this (full code example.):
1
import pytorch_lightning as pl
2
3
class LitModel(pl.LightningModule):
4
def validation_epoch_end(self, outputs):
5
loss = ...
6
y_true = ...
7
y_pred = ...
8
acc = accuracy_score(y_true, y_pred)
9
self.log("val/loss", loss)
10
self.log("val/acc", acc)
Copied!
val/acc is a path in Neptune run. Remember that prefix is always added (you can change it when you create NeptuneLogger), so the full namespace in Neptune is training/val/acc. See this example in Neptune.

Log after fitting or testing is finished

You can log objects after the fitting or testing methods are finished. This is because you can use created neptune_logger that can log outside Trainer. The idea is like this:
1
from pytorch_lightning import Trainer
2
from pytorch_lightning.loggers import NeptuneLogger
3
4
# create logger
5
neptune_logger = NeptuneLogger(project="WORKSPACE/PROJECT")
6
7
trainer = pl.Trainer(logger=neptune_logger)
8
model = ...
9
datamodule = ...
10
11
# run fit and test
12
trainer.fit(model, datamodule=datamodule)
13
trainer.test(model, datamodule=datamodule)
14
15
##############################################
16
# log additional metadata after fit and test #
17
##############################################
18
19
# log confusion matrix as image
20
from neptune.new.types import File
21
22
fig, ax = plt.subplots(figsize=(16, 12))
23
plot_confusion_matrix(y_true, y_pred, ax=ax)
24
neptune_logger.experiment['test/confusion_matrix'].upload(File.as_image(fig))
25
26
# generic recipe
27
metadata = ...
28
neptune_logger.experiment["your/metadata/structure"].log(metadata)
Copied!
In this way, you are not restricted to the LightningModule class - you can log from any method or class in your project code.

Pass any neptune.init parameter to the NeptuneLogger

You can also pass neptune_run_kwargs to specify the run in greater detail, like tags and description:
1
from pytorch_lightning.loggers import NeptuneLogger
2
3
neptune_logger = NeptuneLogger(
4
project="WORKSPACE/PROJECT",
5
name="lightning-run",
6
description="mlp quick run with pytorch-lightning",
7
tags=['mlp', 'quick-run'],
8
)
9
10
trainer = Trainer(max_epochs=300, logger=neptune_logger)
Copied!

Model checkpoints

If you have ModelCheckpoint configured, Neptune logger automatically logs model checkpoints. Model weights will be uploaded to the: "model/checkpoints" namespace in the Neptune Run.
Model checkpoint logged to the run
You can disable this option when you create NeptuneLogger:
1
from pytorch_lightning.loggers import NeptuneLogger
2
3
neptune_logger = NeptuneLogger(
4
project="WORKSPACE/PROJECT",
5
log_model_checkpoints=False
6
)
Copied!

Model summary

You can log the model summary - as generated by the ModelSummary utility from lightning. It will appear under: "model/summary" namespace.
1
from pytorch_lightning.loggers import NeptuneLogger
2
3
neptune_logger = NeptuneLogger(project="WORKSPACE/PROJECT")
4
5
model = ... # LightningModule
6
7
# log model summary
8
neptune_logger.log_model_summary(model=model, max_depth=-1)
Copied!
Model summary: layers, parameters model params size

Best model score and path

If you have ModelCheckpoint configured, Neptune logger automatically logs best_model_path and best_model_score values. They will be logged under "model" namespace in the Neptune Run.
Best model path and score with checkpoint and summary.

Log gradients

If you specify Trainer to track gradient norms, these norms will be automatically logged to Neptune. You can further see them in interactive charts in Neptune.
1
import pytorch_lightning as pl
2
from pytorch_lightning.loggers import NeptuneLogger
3
4
# (neptune) create NeptuneLogger
5
neptune_logger = NeptuneLogger(project="WORKSPACE/PROJECT")
6
7
trainer = pl.Trainer(
8
logger=neptune_logger,
9
log_every_n_steps=50,
10
track_grad_norm=2, # track gradient norm
11
)
Copied!
Gradients norms logged to the run and grouped for layers

Log hyper-parameters

You can log hyper-parameters by using standard log_hyperparams method from the lightning logger.
1
from pytorch_lightning.loggers import NeptuneLogger
2
3
PARAMS = ... # dict or argparse
4
5
neptune_logger.log_hyperparams(params=PARAMS)
Copied!
Hyperparams logged to the Run.
All hyper-params (and metrics) can be displayed as a column on the dashboards.
Three example runs with metrics and hyper-params displayed as columns

What's next?