PyTorch integration guide#
Info
Neptune also integrates with several other libraries from the PyTorch ecosystem:
This guide walks you through keeping track of your model training metadata when using PyTorch . We'll use the NeptuneLogger
class to:
- Log training metrics
- Upload model checkpoints
- Log model predictions
See example in Neptune  Code examples 
Before you start#
- Sign up at neptune.ai/register.
- Create a project for storing your metadata.
- Have PyTorch installed.
- To follow the example, you'll also need to have torchvision, numpy, and torchviz installed.
Installing the integration#
To use your preinstalled version of Neptune together with the integration:
To install both Neptune and the integration:
How do I save my credentials as environment variables?
Set your Neptune API token and full project name to the NEPTUNE_API_TOKEN
and NEPTUNE_PROJECT
environment variables, respectively.
For example:
- On Windows, the command is
set
instead ofexport
.
- On Windows, the command is
set
instead ofexport
.
Finding your credentials:
- API token: In the bottom-left corner of the Neptune app, expand your user menu and select Get your API token.
- Project: Your full project name has the form
workspace-name/project-name
. To copy the name, click the menu in the top-right corner and select Edit project details.
If you're working in Colab, you can set your credentials with the os and getpass libraries:
If you'd rather follow the guide without any setup, you can run the example in Colab .
Basic logging example#
Set up the model and training config#
-
Import the needed libraries:
-
Create a Neptune run:
-
If you haven't set up your credentials, you can log anonymously:
-
-
Define your hyperparameters.
-
Set up the model:
class Model(nn.Module): def __init__(self, input_sz, hidden_dim, n_classes): super(Model, self).__init__() self.seq_model = nn.Sequential( nn.Linear(input_sz, hidden_dim * 2), nn.ReLU(), nn.Linear(hidden_dim * 2, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, hidden_dim // 2), nn.ReLU(), nn.Linear(hidden_dim // 2, n_classes), ) def forward(self, input): x = input.view(-1, 32 * 32 * 3) return self.seq_model(x) model = Model( parameters["input_sz"], parameters["input_sz"], parameters["n_classes"] ).to(parameters["device"]) criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=parameters["lr"])
-
Download and transform the data for training:
data_dir = "data/CIFAR10" compressed_ds = "./data/CIFAR10/cifar-10-python.tar.gz" data_tfms = { "train": transforms.Compose( [ transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), ] ), "val": transforms.Compose( [ transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]), ] ), } trainset = datasets.CIFAR10( data_dir, transform=data_tfms["train"], download=True ) trainloader = torch.utils.data.DataLoader( trainset, batch_size=parameters["bs"], shuffle=True, num_workers=0 ) validset = datasets.CIFAR10( data_dir, train=False, transform=data_tfms["train"], download=True ) validloader = torch.utils.data.DataLoader( validset, batch_size=parameters["bs"], num_workers=0 ) classes = [ "airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck", ]
Add Neptune logging#
-
Create a
NeptuneLogger
instance: -
Log the hyperparameters from earlier:
from neptune.utils import stringify_unsupported run[npt_logger.base_namespace]["hyperparams"] = stringify_unsupported( # (1)! parameters )
- You can use the
base_namespace
attribute of the logger to log metadata consistently under the "base_namespace" namespace.
- You can use the
-
Log metrics while training.
In this example, the metrics are logged under the
"batch"
namespace every 30 steps.for epoch in range(parameters["epochs"]): for i, (x, y) in enumerate(trainloader, 0): x, y = x.to(parameters["device"]), y.to(parameters["device"]) optimizer.zero_grad() outputs = model(x) _, preds = torch.max(outputs, 1) loss = criterion(outputs, y) acc = (torch.sum(preds == y.data)) / len(x) # Log after every 30 steps if i % 30 == 0: run[npt_logger.base_namespace]["batch/loss"].append(loss.item()) run[npt_logger.base_namespace]["batch/acc"].append(acc.item()) loss.backward() optimizer.step() npt_logger.log_checkpoint() # (1)!
-
The checkpoint number is automatically incremented on the subsequent call:
- Call 1 →
ckpt_1.pt
- Call 2 →
ckpt_2.pt
- Call 1 →
-
-
To stop the connection to Neptune and sync all data, call the
stop()
method:
Run your script as you normally would. To open the run and explore the metrics, parameters, and predictions, click the Neptune link that appears in the console output.
Sample output
https://app.neptune.ai/workspace-name/project-name/e/RUN-100/metadata
The general format is https://app.neptune.ai/<workspace>/<project>
followed by the Neptune ID of the initialized object.
More options#
Saving checkpoint per epoch#
You can save the model checkpoint at the end of the training loop.
for epoch in range(parameters["epochs"]):
...
# Log after every 30 steps
if i % 30 == 0:
run[npt_logger.base_namespace]["batch/loss"].append(loss.item())
run[npt_logger.base_namespace]["batch/acc"].append(acc.item())
loss.backward()
optimizer.step()
npt_logger.log_checkpoint()
The checkpoint number is automatically incremented on the subsequent call:
- First call →
ckpt_1.pt
- Second call →
ckpt_2.pt
- And so on.
Logging model predictions#
You can log the predictions made by the model as follows:
from neptune.types import File
dataiter = iter(validloader)
images, labels = next(dataiter)
# Predict batch of n_samples
n_samples = 10
imgs = images[:n_samples].to(parameters["device"])
probs = torch.nn.functional.softmax(model(imgs), dim=1)
# Decode probs and log tensors as image
for i, ps in enumerate(probs):
pred = classes[torch.argmax(ps)]
ground_truth = classes[labels[i]]
description = f"pred: {pred} | ground truth: {ground_truth}"
# Log series of tensors as image and predictions
run[npt_logger.base_namespace]["predictions"].append(
File.as_image(imgs[i].cpu().squeeze().permute(2, 1, 0).clip(0, 1)),
name=f"{i}_{pred}_{ground_truth}",
description=description,
)
The predictions are logged as a series of images under the "predictions"
namespace.
Saving the final model#
Before stopping the run, you can save the final model as model.pt
:
Related
- What you can log and display
- Log arrays and tensors
- Neptune-PyTorch API reference
- neptune-pytorch repo on GitHub
- PyTorch repo on GitHub