PyTorch Lightning integration guide#

Custom dashboard displaying metadata logged with PyTorch Lightning

PyTorch Lightning is a lightweight PyTorch wrapper for high-performance AI research. With the Neptune integration, you can:

Track training code, environment, and Git information
Log hyperparameters
Monitor hardware consumption
Monitor model training live
Save model checkpoints
Log performance charts and images
Log training, validation, and testing metrics and visualize them in the Neptune app

See example in Neptune Code examples

Quickstart#

Set up Neptune in your environment.
Show steps
1. Sign up at neptune.ai/register .
2. Create a project.
3. Install the Neptune client library:
```
pip install -U neptune
```
4. Set your Neptune API token to the NEPTUNE_API_TOKEN environment variable.
  Linux & macOS Windows Jupyter Notebook
  export NEPTUNE_API_TOKEN="uyVrZXkiOiIzNTd0Zj...ifQ=="
  
  setx NEPTUNE_API_TOKEN "uyVrZXkiOiIzNTd0Zj...ifQ=="
  
  %env NEPTUNE_API_TOKEN="uyVrZXkiOiIzNTd0Zj...ifQ=="
5. Set the name of your Neptune project to the NEPTUNE_PROJECT environment variable.
  
  You can copy the full name from your project details.
  Linux & macOS Windows Jupyter Notebook
  export NEPTUNE_PROJECT="workspace-name/project-name"
  
  setx NEPTUNE_PROJECT "workspace-name/project-name"
  
  %env NEPTUNE_PROJECT="workspace-name/project-name"
For detailed instructions, see Getting started.

Create the logger:

from lightning.pytorch.loggers import NeptuneLogger

neptune_logger = NeptuneLogger()

Pass the logger to the logger argument of the PyTorch Lightning trainer:

trainer = Trainer(
    ...
    logger=neptune_logger,
)

Success

Neptune logging will be enabled when you run your Trainer with trainer.fit().

Full walkthrough#

This guide walks you through connecting Neptune to your machine learning scripts and analyzing some logged metadata.

Before you start#

Sign up at neptune.ai/register.
Create a project for storing your metadata.
Set your Neptune credentials (API token and target project name).
Linux macOS Windows Jupyter Notebook
export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
export NEPTUNE_PROJECT="ml-team/classification"
export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
export NEPTUNE_PROJECT="ml-team/classification"
setx NEPTUNE_API_TOKEN "h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
setx NEPTUNE_PROJECT "ml-team/classification"
You can also navigate to Settings → Edit the system environment variables and add the variables there.
%env NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
%env NEPTUNE_PROJECT="ml-team/classification"
How do I find my Neptune credentials?
- API token: In the bottom-left corner of the Neptune app, open your user menu and select Get your API token. If you need the token of a service account, go to the workspace or project settings and select Service accounts.
- Project name: Your full project name has the form workspace-name/project-name. You can copy it from the project menu ( → Details & privacy).
  - You can also find a pre-filled project string in Experiments → Create a new run.
Install the Neptune client library and Lightning on your system.
```
pip install -U neptune lightning
```

If you'd rather follow the guide without any setup, you can run the example in Colab or Lightning AI Studio .

Adding NeptuneLogger to the PyTorch Lightning script#

PyTorch Lightning has a unified way of logging metadata, by using loggers. You can learn more about logger support in the PyTorch Lightning docs .

To enable Neptune logging:

Create a NeptuneLogger instance:

from lightning.pytorch.loggers import NeptuneLogger

neptune_logger = NeptuneLogger() # (1)!

If you haven't registered, you can try the integration anonymously:

from neptune import ANONYMOUS_API_TOKEN

neptune_logger = NeptuneLogger(
    project="common/pytorch-lightning-integration",
    api_token=ANONYMOUS_API_TOKEN,
)

To log hyperparameters, you can use the standard log_hyperparams() method from the PyTorch Lightning logger.
```
PARAMS = ...  # dict or argparse
neptune_logger.log_hyperparams(PARAMS)
```

Set up your LightningModule to log metrics or other outputs. You can also use Neptune methods to log additional metadata.

Specifying the metadata structure

Metrics are logged as nested dictionary-like structures. You can specify the structure with: self.log("path/to/metric", value).
To log outside the default "training" prefix, pass a different namespace name to the prefix argument of NeptuneLogger().

Example

from lightning import LightningModule

class MNISTModel(LightningModule):
    def training_step(self, batch, batch_idx):
        loss = ...
        self.log("train/batch/loss", loss)

        acc = ...
        self.log("train/batch/acc", acc)

    def training_epoch_end(self, outputs):
        loss = ...
        acc = ...
        self.log("train/epoch/loss", loss)
        self.log("train/epoch/acc", acc)

Result

training  # default prefix
|—— train
    |—— batch
        |—— loss
        |—— acc
    |—— epoch
        |—— loss
        |—— acc

Pass neptune_logger to the trainer:

trainer = Trainer(
    ...,
    logger=neptune_logger,
)

Pass your LightningModule and DataLoader instances to the fit() method of the trainer:

model = My_LightningModule()
train_loader = My_DataLoader()

trainer.fit(model, train_loader)

Run your script:
```
python main.py
```

If Neptune can't find your project name or API token

As a best practice, you should save your Neptune API token and project name as environment variables:

export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8"

export NEPTUNE_PROJECT="ml-team/classification"

Alternatively, you can pass the information when using a function that takes api_token and project as arguments:

run = neptune.init_run(
    api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8", # (1)!
    project="ml-team/classification", # (2)!
)

In the bottom-left corner, expand the user menu and select Get my API token.
You can copy the path from the project details ( → Details & privacy).

If you haven't registered, you can log anonymously to a public project:

api_token=neptune.ANONYMOUS_API_TOKEN
project="common/quickstarts"

Make sure not to publish sensitive data through your code!

Your metadata will be logged in the given Neptune project for analysis, comparison, and collaboration.

To browse the metadata, follow the Neptune link in the console output.

Sample output

[neptune] [info ] Neptune initialized. Open in the app: https://app.neptune.ai/workspace/project/e/RUN-1

Analyzing the logged metadata in Neptune#

To view the metadata from your PyTorch Lightning run:

Click on the run you want to inspect.

The All metadata tab opens.
Click training (or the name of your custom namespace, if you specified a different prefix when creating the logger).

Metrics are logged as nested dictionary-like structures defined in the LightningModule.

Tip

Create a custom dashboard to visualize the metadata in different ways.

If your LightningModule code does not do any logging, the training namespace will only contain the status and (unless disabled) model checkpoints.

Comparing runs against each other#

Toggle the eye icons to select runs for comparison.

You can also create comparison-specific widgets, such as scatter plots.

More options#

Additional logger options#

You can configure the Neptune logger in various ways to address custom logging needs.

from lightning.pytorch.loggers import NeptuneLogger

neptune_logger = NeptuneLogger(
    project="ml-team/nli-project",
    name="shallow-panda", # (1)!
    prefix="finetune",
    log_model_checkpoints=False,
    # Log environment dependencies 
    dependencies="infer",  # You can also pass the path to your dependencies file

)

Sets a custom name, which you can use as a human-friendly ID.

To display it in the app, add sys/name as a column.

You can also edit the name in the run information view ( → Run information).

For detailed parameter descriptions, see the API reference.

Uploading model checkpoints#

If you have the ModelCheckpoint callback configured, the Neptune logger automatically logs whichever checkpoints are saved by the callback, respecting save_top_k and save_last.

Otherwise, only the final model checkpoint is uploaded.

Model weights are logged in the <prefix>/model/checkpoints namespace of the Neptune run.

How to disable

To disable this option, set log_model_checkpoints to False when you create the NeptuneLogger instance:

neptune_logger = NeptuneLogger(log_model_checkpoints=False)

Passing Neptune keyword arguments#

The Neptune logger accepts neptune.init_run() arguments. You can use them to supply more details, customize the behavior, or disable auto-logging that is enabled by default.

Example

from lightning.pytorch.loggers import NeptuneLogger

neptune_logger = NeptuneLogger(
    project="ml-team/nli-project",
    description="Quick training run with updated datasets",
    tags=["training", "lightning", "data v1.0.1"],
    source_files="*.py",
    dependencies="infer",
    git_ref=False,
)

Show init_run() parameters list

See in API reference: neptune.init_run()

Name	Type	Default	Description
`project`	`str`, optional	`None`	Name of a project in the form `workspace-name/project-name`. If `None`, the value of the `NEPTUNE_PROJECT` environment variable is used.
`api_token`	`str`, optional	`None`	Your Neptune API token (or a service account's API token). If `None`, the value of the `NEPTUNE_API_TOKEN` environment variable is used. To keep your token secure, avoid placing it in source code. Instead, save it as an environment variable.
`with_id`	`str`, optional	`None`	The Neptune identifier of an existing run to resume, such as "CLS-11". The identifier is stored in the object's `sys/id` field. If omitted or `None` is passed, a new tracked run is created.
`custom_run_id`	`str`, optional	`None`	A unique identifier that can be used to log metadata to a single run from multiple locations. Max length: 36 characters. If `None` and the `NEPTUNE_CUSTOM_RUN_ID` environment variable is set, Neptune will use that as the `custom_run_id` value. For details, see Set custom run ID.
`mode`	`str`, optional	`async`	Connection mode in which the logging will work. Possible values are `async`, `sync`, `offline`, `read-only`, and `debug`. If you leave it out, the value of the `NEPTUNE_MODE` environment variable is used. If that's not set, the default `async` is used.
`name`	`str`, optional	Neptune ID	Custom name for the run. You can use it as a human-readable ID and add it as a column in the experiments table (`sys/name`). If left empty, once the run is synchronized with the server, Neptune sets the auto-generated identifier (`sys/id`) as the name.
`description`	`str`, optional	`""`	Editable description of the run. You can add it as a column in the experiments table (`sys/description`).
`tags`	`list`, optional	`[]`	Must be a list of `str` which represent the tags for the run. You can edit them after run is created, either in the run information or experiments table.
`source_files`	`list` or `str`, optional	`None`	List of source files to be uploaded. Must be list of `str` or a single `str`. Uploaded sources are displayed in the Source code section of the run. If `None` is passed, the Python file from which the run was created will be uploaded. When resuming a run, no file will be uploaded by default. Pass an empty list (`[]`) to upload no files. Unix style pathname pattern expansion is supported. For example, you can pass `".py"` to upload all Python source files from the current directory. Paths of uploaded files are resolved relative to the calculated common root of all uploaded source files. For recursion lookup, use `"**/.py"` (for Python `3.5` and later). For details, see the glob library.
`capture_stdout`	`Boolean`, optional	`True`	Whether to log the standard output stream. Is logged in the `monitoring` namespace.
`capture_stderr`	`Boolean`, optional	`True`	Whether to log the standard error stream. Is logged in the `monitoring` namespace.
`capture_hardware_metrics`	`Boolean`, optional	`True`	Whether to track hardware consumption (CPU, GPU, memory utilization). Logged in the `monitoring` namespace.
`fail_on_exception`	`Boolean`, optional	`True`	If an uncaught exception occurs, whether to set run's `Failed` state to `True`.
`monitoring_namespace`	`str`, optional	`"monitoring"`	Namespace inside which all monitoring logs will be stored.
`flush_period`	`float`, optional	`5` (seconds)	In asynchronous (default) connection mode, how often Neptune should trigger disk flushing.
`proxies`	`dict`, optional	`None`	Argument passed to HTTP calls made via the Requests library. For details on proxies, see the Requests documentation.
`capture_traceback`	`Boolean`, optional	`True`	In case of an exception, whether to log the traceback of the run.
`git_ref`	`GitRef` or `Boolean`	`None`	`GitRef` object containing information about the Git repository path. If `None`, Neptune looks for a repository in the path of the script that is executed. To specify a different location, set to `GitRef(repository_path="path/to/repo")`. To turn off Git tracking for the run, set to `GitRef.DISABLED` or `False`. For examples, see Logging Git info.
`dependencies`	`str`, optional	`None`	Tracks environment requirements. If you pass `"infer"` to this argument, Neptune logs dependencies installed in the current environment. You can also pass a path to your dependency file directly. If left empty, no dependency file is uploaded.
`async_lag_callback`	`NeptuneObjectCallback`, optional	`None`	Custom callback function which is called if the lag between a queued operation and its synchronization with the server exceeds the duration defined by `async_lag_threshold`. The callback should take a `Run` object as the argument and can contain any custom code, such as calling `stop()` on the object. Note: Instead of using this argument, you can use Neptune's default callback by setting the `NEPTUNE_ENABLE_DEFAULT_ASYNC_LAG_CALLBACK` environment variable to `TRUE`.
`async_lag_threshold`	`float`, optional	`1800.0` (seconds)	Duration between the queueing and synchronization of an operation. If a lag callback (default callback enabled via environment variable or custom callback passed to the `async_lag_callback` argument) is enabled, the callback is called when this duration is exceeded.
`async_no_progress_callback`	`NeptuneObjectCallback`, optional	`None`	Custom callback function which is called if there has been no synchronization progress whatsoever for the duration defined by `async_no_progress_threshold`. The callback should take a `Run` object as the argument and can contain any custom code, such as calling `stop()` on the object. Note: Instead of using this argument, you can use Neptune's default callback by setting the `NEPTUNE_ENABLE_DEFAULT_ASYNC_NO_PROGRESS_CALLBACK` environment variable to `TRUE`.
`async_no_progress_threshold`	`float`, optional	`300.0` (seconds)	For how long there has been no synchronization progress. If a no-progress callback (default callback enabled via environment variable or custom callback passed to the `async_no_progress_callback` argument) is enabled, the callback is called when this duration is exceeded.

Using an existing run#

To associate the logger with an existing Neptune run, initialize the run in your code and pass it to the run argument.

import neptune
from lightning.pytorch.loggers import NeptuneLogger

my_run = neptune.init_run(with_id="NLI-7") # (1)!
neptune_logger = NeptuneLogger(run=my_run)

In this example, the project key would be NLI and the run ID NLI-7.

How do I find the ID?

The Neptune ID is a unique identifier for the run. The Experiments tab displays it in the leftmost column.

In the run structure, the ID is stored in the system namespace (sys).

If the run is active, you can obtain its ID with run["sys/id"].fetch(). For example:
```
>>> run = neptune.init_run()
...
>>> run["sys/id"].fetch()
'CLS-26'
```

If you set a custom run ID, it's stored at sys/custom_run_id:

>>> run["sys/custom_run_id"].fetch()
'vigilant-puffin-20bt9'

Using logger methods in LightningModule#

You can use the default logging methods with the Neptune logger:

log()
log_metrics()
log_hyperparams()

Example

from lightning import LightningModule

class LitModel(LightningModule):
    def training_step(self, batch, batch_idx):
        # log metrics
        acc = ...
        self.log("train/loss", loss)  # standard log method

As another example, the below code results in two Float series (acc and loss) logged under the namespace val.

class LitModel(LightningModule):
    def validation_epoch_end(self, outputs):
        loss = ...
        y_true = ...
        y_pred = ...
        acc = accuracy_score(y_true, y_pred)
        self.log("val/loss", loss)
        self.log("val/acc", acc)

The val namespace is nested under the base namespace (<prefix>/val).

See result in Neptune

Using Neptune methods in LightningModule#

To log custom metadata (such as images, CSV files, or interactive charts) you can access the Neptune run directly with the self.logger.experiment attribute.

You can then use logging methods from the Neptune client library to track your metadata, such as append(), track_files(), and upload().

from neptune.types import File

class LitModel(LightningModule):
    def any_lightning_module_function_or_hook(self):
        # Log images, using the Neptune client library
        img = ...
        self.logger.experiment["train/misclassified_imgs"].append(File.as_image(img))

        # Generic recipe, using the Neptune client library
        metadata = ...
        self.logger.experiment["your/metadata/structure"] = metadata

Best model score and path#

If you have ModelCheckpoint configured, the Neptune logger automatically logs the best_model_path and best_model_score values.

They are logged in the <prefix>/model namespace of the Neptune run.

Model summary#

You can log the model summary, as generated by the ModelSummary utility from PyTorch Lightning.

The summary is logged in the <prefix>/model/summary namespace of the Neptune run.

neptune_logger = NeptuneLogger()
model = ...  # LightningModule

neptune_logger.log_model_summary(model=model, max_depth=-1)

Logging after fitting or testing is finished#

You can use the created Neptune logger outside of the Trainer context, which lets you log objects after the fitting or testing methods are finished.

This way, you're not restricted to the LightningModule class – you can log from any method or class in your project code.

Example

from lightning.pytorch.loggers import NeptuneLogger

# Create logger
neptune_logger = NeptuneLogger()

trainer = Trainer(logger=neptune_logger)
model = ...
datamodule = ...

# Run fit and test
trainer.fit(model, datamodule=datamodule)
trainer.test(model, datamodule=datamodule)

Log additional metadata after fit and test:

Log confusion matrix as image

from neptune.types import File

fig, ax = plt.subplots()
plot_confusion_matrix(y_true, y_pred, ax=ax)
neptune_logger.experiment["test/confusion_matrix"].upload(File.as_image(fig))

neptune_logger.experiment.stop()

Generic recipe for logging additional metadata:

metadata = ...
neptune_logger.experiment["your/metadata/structure"] = metadata

PyTorch Lightning integration guide#

Quickstart#

Full walkthrough#

Before you start#

Adding NeptuneLogger to the PyTorch Lightning script#

Analyzing the logged metadata in Neptune#

Comparing runs against each other#

More options#

Additional logger options#

Uploading model checkpoints#

Passing Neptune keyword arguments#

Using an existing run#

Using logger methods in LightningModule#

Using Neptune methods in LightningModule#

Logging model metadata#

Best model score and path#

Model summary#

Logging after fitting or testing is finished#