PyTorch integration guide#
Info
Neptune also integrates with several other libraries from the PyTorch ecosystem:
This guide walks you through keeping track of your model training metadata when using PyTorch . We'll use the NeptuneLogger
class to:
- Log training metrics
- Upload model checkpoints
- Log model predictions
See example in Neptune  Code examples 
Before you start#
- Sign up at neptune.ai/register.
- Create a project for storing your metadata.
- Have PyTorch installed.
- To follow the example, you'll also need to have torchvision, numpy, and torchviz installed.
Installing the integration#
To use your preinstalled version of Neptune together with the integration:
To install both Neptune and the integration:
How do I save my credentials as environment variables?
Set your Neptune API token and full project name to the NEPTUNE_API_TOKEN
and NEPTUNE_PROJECT
environment variables, respectively.
You can also navigate to Settings → Edit the system environment variables and add the variables there.
To find your credentials:
- API token: In the bottom-left corner of the Neptune app, expand your user menu and select Get your API token. If you need the token of a service account, go to the workspace or project settings and enter the Service accounts settings.
- Project name: Your full project name has the form
workspace-name/project-name
. You can copy it from the project menu ( → Details & privacy).
If you're working in Google Colab, you can set your credentials with the os and getpass libraries:
If you'd rather follow the guide without any setup, you can run the example in Colab .
Basic logging example#
Set up the model and training config#
-
Import the needed libraries:
-
Create a Neptune run:
-
If you haven't set up your credentials, you can log anonymously:
-
-
Define your hyperparameters.
-
Set up the model:
class Model(nn.Module): def __init__(self, input_sz, hidden_dim, n_classes): super(Model, self).__init__() self.seq_model = nn.Sequential( nn.Linear(input_sz, hidden_dim * 2), nn.ReLU(), nn.Linear(hidden_dim * 2, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, hidden_dim // 2), nn.ReLU(), nn.Linear(hidden_dim // 2, n_classes), ) def forward(self, input): x = input.view(-1, 32 * 32 * 3) return self.seq_model(x) model = Model( parameters["input_sz"], parameters["input_sz"], parameters["n_classes"] ).to(parameters["device"]) criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=parameters["lr"])
-
Download and transform the data for training:
data_dir = "data/CIFAR10" compressed_ds = "./data/CIFAR10/cifar-10-python.tar.gz" data_tfms = { "train": transforms.Compose( [ transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), ] ), "val": transforms.Compose( [ transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), ] ), } trainset = datasets.CIFAR10( data_dir, transform=data_tfms["train"], download=True ) trainloader = torch.utils.data.DataLoader( trainset, batch_size=parameters["bs"], shuffle=True, num_workers=0 ) validset = datasets.CIFAR10( data_dir, train=False, transform=data_tfms["train"], download=True ) validloader = torch.utils.data.DataLoader( validset, batch_size=parameters["bs"], num_workers=0 ) classes = [ "airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck", ]
Add Neptune logging#
-
Create a
NeptuneLogger
instance: -
Log the hyperparameters from earlier:
from neptune.utils import stringify_unsupported run[npt_logger.base_namespace]["hyperparams"] = stringify_unsupported( # (1)! parameters )
- You can use the
base_namespace
attribute of the logger to log metadata consistently under the "base_namespace" namespace.
- You can use the
-
Log metrics while training.
In this example, the metrics are logged under the
"batch"
namespace every 30 steps.for epoch in range(parameters["epochs"]): for i, (x, y) in enumerate(trainloader, 0): x, y = x.to(parameters["device"]), y.to(parameters["device"]) optimizer.zero_grad() outputs = model(x) _, preds = torch.max(outputs, 1) loss = criterion(outputs, y) acc = (torch.sum(preds == y.data)) / len(x) # Log after every 30 steps if i % 30 == 0: run[npt_logger.base_namespace]["batch/loss"].append(loss.item()) run[npt_logger.base_namespace]["batch/acc"].append(acc.item()) loss.backward() optimizer.step() npt_logger.log_checkpoint() # (1)!
-
The checkpoint number is automatically incremented on the subsequent call:
- Call 1 →
ckpt_1.pt
- Call 2 →
ckpt_2.pt
- Call 1 →
-
-
To stop the connection to Neptune and sync all data, call the
stop()
method:
Run your script as you normally would. To open the run and explore the metrics, parameters, and predictions, click the Neptune link that appears in the console output.
Sample output
[neptune] [info ] Neptune initialized. Open in the app:
https://app.neptune.ai/workspace/project/e/RUN-1
More options#
Saving checkpoint per epoch#
You can save the model checkpoint at the end of the training loop.
for epoch in range(parameters["epochs"]):
...
# Log after every 30 steps
if i % 30 == 0:
run[npt_logger.base_namespace]["batch/loss"].append(loss.item())
run[npt_logger.base_namespace]["batch/acc"].append(acc.item())
loss.backward()
optimizer.step()
npt_logger.log_checkpoint()
The checkpoint number is automatically incremented on the subsequent call:
- First call →
ckpt_1.pt
- Second call →
ckpt_2.pt
- And so on.
Logging model predictions#
You can log the predictions made by the model as follows:
from neptune.types import File
dataiter = iter(validloader)
images, labels = next(dataiter)
# Predict batch of n_samples
n_samples = 10
imgs = images[:n_samples].to(parameters["device"])
probs = torch.nn.functional.softmax(model(imgs), dim=1)
# Decode probs and log tensors as image
for i, ps in enumerate(probs):
pred = classes[torch.argmax(ps)]
ground_truth = classes[labels[i]]
description = f"pred: {pred} | ground truth: {ground_truth}"
# Log series of tensors as image and predictions
run[npt_logger.base_namespace]["predictions"].append(
File.as_image(imgs[i].cpu().squeeze().permute(2, 1, 0).clip(0, 1)),
name=f"{i}_{pred}_{ground_truth}",
description=description,
)
The predictions are logged as a series of images under the "predictions"
namespace.
Saving the final model#
Before stopping the run, you can save the final model as model.pt
:
Related
- What you can log and display
- Log arrays and tensors
- Neptune-PyTorch API reference
- neptune-pytorch repo on GitHub
- PyTorch repo on GitHub