Skip to content

Tracking and visualizing cross-validation results#

Open in Colab

When training models with cross-validation, you can use Neptune namespaces (folders) to organize, visualize and compare models. You can also use namespaces to present cross-validation results, for example, when experimenting with \(k\)-fold validation or train/validation/test splits.

In this guide, you'll learn how to analyze your results more easily by organizing your run to track cross-validation metadata.

Viewing batch metrics for a particular fold

See example in Neptune  Code examples 

Before you start#

Integration tip

When using the Neptune integration with for example XGBoost and LightGBM, you get this structure for cross-validation metadata automatically.

Learn more: Working with XGBoost, Working with LightGBM

Create a script that tracks cross-validation metadata#

We'll create a model training script where we run \(k\)-fold validation and log some parameters and metrics.

To organize the run, we'll create two categories of namespace:

Namespace name Description Metadata inside the namespace
global Metadata common to all folds
  • epoch
  • learning rate
  • batch size
  • mean score
fold_n Metadata specific to each fold
  • metrics
  • saved model

This structure helps you organize metadata from different folds into folders and a global aggregate of all metrics – that is, an accuracy chart of all folds. This makes it easier to navigate, compare, and retrieve metadata from any of the folds, both in the app and through the API.

The highlighted lines show where Neptune logging comes in.

cross-validation_example.py
import neptune.new as neptune

run = neptune.init_run()  # (1)

parameters = {
    "epochs": 1,
    "lr": 1e-2,
    "bs": 10,
    "input_sz": 32 * 32 * 3,
    "n_classes": 10,
    "k_folds": 5,
    "model_name": "checkpoint.pth",
    "device": torch.device("cuda:0" if torch.cuda.is_available() else "cpu"),
    "seed": 42,
}

run["global/parameters"] = parameters

splits = KFold(n_splits=parameters["k_folds"], shuffle=True)

for fold, (train_ids, val_ids) in enumerate(splits.split(dataset)):
    for epoch in range(parameters["epochs"]):
        for i, (x, y) in enumerate(trainloader, 0):
            # Log batch loss
            run[f"fold_{fold}/training/batch/loss"].log(loss)
            # Log batch accuracy
            run[f"fold_{fold}/training/batch/acc"].log(acc)

    # Log model checkpoint
    torch.save(model.state_dict(), f"./{parameters['model_name']}")
    run[f"fold_{fold}/checkpoint"].upload(parameters["model_name"])

run["global/metrics/train/mean_acc"] = mean(epoch_acc_list)
run["global/metrics/train/mean_loss"] = mean(epoch_loss_list)
  1. We recommend saving your API token and project name as environment variables. If needed, you can pass them as arguments when initializing Neptune: neptune.init_run(project="workspace-name/project-name", api_token="Your Neptune API token here")
cross-validation_example.py
from statistics import mean

import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import KFold
from torch.utils.data import DataLoader, SubsetRandomSampler
from torchvision import datasets, transforms

import neptune.new as neptune

# Step 1: Create a Neptune Run
run = neptune.init_run()  # (1)

# Step 2: Log config and hyperparameters
parameters = {
    "epochs": 1,
    "lr": 1e-2,
    "bs": 10,
    "input_sz": 32 * 32 * 3,
    "n_classes": 10,
    "k_folds": 2,
    "model_name": "checkpoint.pth",
    "device": torch.device("cuda:0" if torch.cuda.is_available() else "cpu"),
    "seed": 42,
}

# Log hyperparameters
run["global/parameters"] = parameters

# Seed
torch.manual_seed(parameters["seed"])

# Model
class BaseModel(nn.Module):
    def __init__(self, input_sz, hidden_dim, n_classes):
        super(BaseModel, self).__init__()
        self.main = nn.Sequential(
            nn.Linear(input_sz, hidden_dim * 2),
            nn.ReLU(),
            nn.Linear(hidden_dim * 2, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim // 2),
            nn.ReLU(),
            nn.Linear(hidden_dim // 2, n_classes),
        )

    def forward(self, input):
        x = input.view(-1, 32 * 32 * 3)
        return self.main(x)


model = BaseModel(
    parameters["input_sz"], parameters["input_sz"], parameters["n_classes"]
).to(parameters["device"])
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=parameters["lr"])

# trainset
data_dir = "data/CIFAR10"
compressed_ds = "./data/CIFAR10/cifar-10-python.tar.gz"
data_tfms = {
    "train": transforms.Compose(
        [
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
        ]
    )
}
trainset = datasets.CIFAR10(data_dir, transform=data_tfms["train"], download=True)
dataset_size = len(trainset)

run["global/dataset/CIFAR-10"].track_files(data_dir)
run["global/dataset/dataset_transforms"] = data_tfms
run["global/dataset/dataset_size"] = dataset_size

splits = KFold(n_splits=parameters["k_folds"], shuffle=True)
epoch_acc_list, epoch_loss_list = [], []

# Step 3: Log losses and metrics per fold
for fold, (train_ids, _) in enumerate(splits.split(trainset)):
    train_sampler = SubsetRandomSampler(train_ids)
    train_loader = DataLoader(
        trainset, batch_size=parameters["bs"], sampler=train_sampler
    )
    for epoch in range(parameters["epochs"]):
        epoch_acc, epoch_loss = 0, 0.0
        for x, y in train_loader:
            x, y = x.to(parameters["device"]), y.to(parameters["device"])
            optimizer.zero_grad()
            outputs = model.forward(x)
            _, preds = torch.max(outputs, 1)
            loss = criterion(outputs, y)
            acc = (torch.sum(preds == y.data)) / len(x)

            # Log batch loss and acc
            run[f"fold_{fold}/training/batch/loss"].log(loss)
            run[f"fold_{fold}/training/batch/acc"].log(acc)

            loss.backward()
            optimizer.step()

        epoch_acc += torch.sum(preds == y.data).item()
        epoch_loss += loss.item() * x.size(0)
    epoch_acc_list.append((epoch_acc / len(train_loader.sampler)) * 100)
    epoch_loss_list.append(epoch_loss / len(train_loader.sampler))

    # Log model checkpoint
    torch.save(model.state_dict(), f"./{parameters['model_name']}")
    run[f"fold_{fold}/checkpoint"].upload(parameters["model_name"])

# Log mean of metrics across all folds
run["global/metrics/train/mean_acc"] = mean(epoch_acc_list)
run["global/metrics/train/mean_loss"] = mean(epoch_loss_list)
  1. We recommend saving your API token and project name as environment variables. If needed, you can pass them as arguments when initializing Neptune: neptune.init_run(project="workspace-name/project-name", api_token="Your Neptune API token here")

Run the script#

After you execute the Python script or notebook cell, you should see a Neptune link printed to the console output.

Sample output

https://app.neptune.ai/workspace-name/project-name/e/RUN-100/

The general format is https://app.neptune.ai/<workspace>/<project> followed by the Neptune ID of the initialized object.

Follow the link to open the run in Neptune.

If Neptune can't find your project name or API token

As a best practice, you should save your Neptune API token and project name as environment variables.

You can, however, also pass them as arguments when initializing Neptune:

run = neptune.init_run(
    api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh3Kb8",  # your token here
    project="ml-team/classification",  # your full project name here
)
  • Find and copy your API token by clicking your avatar and selecting Get my API token.
  • Find and copy your project name in the project SettingsProperties.

If you haven't registered, you can also log anonymously to a public project (make sure not to publish sensitive data through your code!):

run = neptune.init_run(
    api_token=neptune.ANONYMOUS_API_TOKEN,
    project="common/quickstarts",
)

Analyze cross-validation results#

In the All metadata section of each run, you can browse the cross-validation results both per fold and on job level.

  • To view global scores, navigate to the "global" namespace.
  • To analyze the metrics per fold, navigate through the fold namespace:

Previewing metrics for a fold

See example in Neptune  Code examples 

You can also create custom dashboards, where you can combine and overlay the metrics from different namespaces according to your needs.