Skip to content

PyTorch integration guide#

Open in Colab

Custom dashboard displaying metadata logged with PyTorch


Neptune also integrates with several other libraries from the PyTorch ecosystem:

This guide walks you through keeping track of your model training metadata when using PyTorch . We'll use the NeptuneLogger class to:

  • Log training metrics
  • Upload model checkpoints
  • Log model predictions

See example in Neptune  Code examples 

Before you start#

  • Sign up at
  • Create a project for storing your metadata.
  • Have PyTorch installed.
  • To follow the example, you'll also need to have torchvision, numpy, and torchviz installed.

Installing the integration#

To use your preinstalled version of Neptune together with the integration:

pip install -U neptune-pytorch
conda install -c conda-forge neptune-pytorch

To install both Neptune and the integration:

pip install -U "neptune[pytorch]"
conda install -c conda-forge neptune neptune-pytorch
How do I save my credentials as environment variables?

Set your Neptune API token and full project name to the NEPTUNE_API_TOKEN and NEPTUNE_PROJECT environment variables, respectively.

export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
export NEPTUNE_PROJECT="ml-team/classification"
export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
export NEPTUNE_PROJECT="ml-team/classification"
setx NEPTUNE_API_TOKEN "h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
setx NEPTUNE_PROJECT "ml-team/classification"

You can also navigate to SettingsEdit the system environment variables and add the variables there.

%env NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
%env NEPTUNE_PROJECT="ml-team/classification"

To find your credentials:

  • API token: In the bottom-left corner of the Neptune app, expand your user menu and select Get your API token. If you need the token of a service account, go to the workspace or project settings and enter the Service accounts settings.
  • Project name: Your full project name has the form workspace-name/project-name. You can copy it from the project menu ( Edit project details).

If you're working in Google Colab, you can set your credentials with the os and getpass libraries:

import os
from getpass import getpass
os.environ["NEPTUNE_API_TOKEN"] = getpass("Enter your Neptune API token: ")
os.environ["NEPTUNE_PROJECT"] = "workspace-name/project-name"

If you'd rather follow the guide without any setup, you can run the example in Colab .

Basic logging example#

Set up the model and training config#

  1. Import the needed libraries:

    import torch
    from torch import nn
    from torch import optim
    from torchvision import transforms, datasets
    import numpy as np
  2. Create a Neptune run:

    import neptune
    run = neptune.init_run() # (1)!
    1. If you haven't set up your credentials, you can log anonymously:

  3. Define your hyperparameters.

    parameters = {
        "lr": 1e-2,
        "bs": 128,
        "input_sz": 32 * 32 * 3,
        "n_classes": 10,
        "model_filename": "basemodel",
        "device": torch.device("cuda" if torch.cuda.is_available() else "cpu"),
        "epochs": 2,
  4. Set up the model:

    class Model(nn.Module):
        def __init__(self, input_sz, hidden_dim, n_classes):
            super(Model, self).__init__()
            self.seq_model = nn.Sequential(
                nn.Linear(input_sz, hidden_dim * 2),
                nn.Linear(hidden_dim * 2, hidden_dim),
                nn.Linear(hidden_dim, hidden_dim // 2),
                nn.Linear(hidden_dim // 2, n_classes),
        def forward(self, input):
            x = input.view(-1, 32 * 32 * 3)
            return self.seq_model(x)
    model = Model(
        parameters["input_sz"], parameters["input_sz"], parameters["n_classes"]
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=parameters["lr"])
  5. Download and transform the data for training:

    data_dir = "data/CIFAR10"
    compressed_ds = "./data/CIFAR10/cifar-10-python.tar.gz"
    data_tfms = {
        "train": transforms.Compose(
                transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
        "val": transforms.Compose(
                transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
    trainset = datasets.CIFAR10(
        data_dir, transform=data_tfms["train"], download=True
    trainloader =
        trainset, batch_size=parameters["bs"], shuffle=True, num_workers=0
    validset = datasets.CIFAR10(
        data_dir, train=False, transform=data_tfms["train"], download=True
    validloader =
        validset, batch_size=parameters["bs"], num_workers=0
    classes = [

Add Neptune logging#

  1. Create a NeptuneLogger instance:

    from neptune_pytorch import NeptuneLogger
    npt_logger = NeptuneLogger(
  2. Log the hyperparameters from earlier:

    from neptune.utils import stringify_unsupported
    run[npt_logger.base_namespace]["hyperparams"] = stringify_unsupported( # (1)!
    1. You can use the base_namespace attribute of the logger to log metadata consistently under the "base_namespace" namespace.
  3. Log metrics while training.

    In this example, the metrics are logged under the "batch" namespace every 30 steps.

    for epoch in range(parameters["epochs"]):
        for i, (x, y) in enumerate(trainloader, 0):
            x, y =["device"]),["device"])
            outputs = model(x)
            _, preds = torch.max(outputs, 1)
            loss = criterion(outputs, y)
            acc = (torch.sum(preds == / len(x)
            # Log after every 30 steps
            if i % 30 == 0:
        npt_logger.log_checkpoint() # (1)!
    1. The checkpoint number is automatically incremented on the subsequent call:

      • Call 1 →
      • Call 2 →
  4. To stop the connection to Neptune and sync all data, call the stop() method:


Run your script as you normally would. To open the run and explore the metrics, parameters, and predictions, click the Neptune link that appears in the console output.

Sample output

[neptune] [info ] Neptune initialized. Open in the app:

In the above example, the run ID is RUN-1.

See example in Neptune 

More options#

Saving checkpoint per epoch#

You can save the model checkpoint at the end of the training loop.

for epoch in range(parameters["epochs"]):

        # Log after every 30 steps
        if i % 30 == 0:



The checkpoint number is automatically incremented on the subsequent call:

  • First call →
  • Second call →
  • And so on.

Logging model predictions#

You can log the predictions made by the model as follows:

from neptune.types import File

dataiter = iter(validloader)
images, labels = next(dataiter)

# Predict batch of n_samples
n_samples = 10
imgs = images[:n_samples].to(parameters["device"])
probs = torch.nn.functional.softmax(model(imgs), dim=1)

# Decode probs and log tensors as image
for i, ps in enumerate(probs):
    pred = classes[torch.argmax(ps)]
    ground_truth = classes[labels[i]]
    description = f"pred: {pred} | ground truth: {ground_truth}"

    # Log series of tensors as image and predictions
        File.as_image(imgs[i].cpu().squeeze().permute(2, 1, 0).clip(0, 1)),

The predictions are logged as a series of images under the "predictions" namespace.

Saving the final model#

Before stopping the run, you can save the final model as