Skip to content

Tracking and visualizing cross validation results#

Open in Colab

Viewing batch metrics for a particular fold

When training models with cross validation, you can use Neptune namespaces (folders) to organize, visualize and compare models. You can also use namespaces to present cross-validation results, for example, when experimenting with k-fold validation or train/validation/test splits.

In this guide, you'll learn how to analyze your results more easily by organizing your run to track cross-validation metadata.

See example in Neptune  Code examples 

Before you start#

  • Sign up at neptune.ai/register.
  • Create a project for storing your metadata.
  • Install Neptune:

    pip install neptune
    
    Passing your Neptune credentials

    Once you've registered and created a project, set your Neptune API token and full project name to the NEPTUNE_API_TOKEN and NEPTUNE_PROJECT environment variables, respectively.

    export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...6Lc"
    

    To find your API token: In the bottom-left corner of the Neptune app, expand the user menu and select Get my API token.

    export NEPTUNE_PROJECT="ml-team/classification"
    

    Your full project name has the form workspace-name/project-name. You can copy it from the project settings: Click the menu in the top-right → Details & privacy.

    On Windows, navigate to SettingsEdit the system environment variables, or enter the following in Command Prompt: setx SOME_NEPTUNE_VARIABLE 'some-value'


    While it's not recommended especially for the API token, you can also pass your credentials in the code when initializing Neptune.

    run = neptune.init_run(
        project="ml-team/classification",  # your full project name here
        api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh...3Kb8",  # your API token here
    )
    

    For more help, see Set Neptune credentials.

Integration tip

When using the Neptune integration with for example XGBoost and LightGBM, you get this structure for cross-validation metadata automatically.

Learn more: XGBoost integration guide, LightGBM integration guide

Create a script that tracks cross-validation metadata#

We'll create a model training script where we run k-fold validation and log some parameters and metrics.

To organize the run, we'll create two categories of namespace:

Namespace name Description Metadata inside the namespace
global Metadata common to all folds
  • epoch
  • learning rate
  • batch size
  • mean score
fold_n Metadata specific to each fold
  • metrics
  • saved model

This structure helps you organize metadata from different folds into folders and a global aggregate of all metrics – that is, an accuracy chart of all folds. This makes it easier to navigate, compare, and retrieve metadata from any of the folds, both in the app and through the API.

The highlighted lines show where Neptune logging comes in.

cross-validation_example.py
import neptune
from neptune.utils import stringify_unsupported

run = neptune.init_run() # (1)!

parameters = {
    "epochs": 1,
    "learning_rate": 1e-2,
    "batch_size": 10,
    "image_size": (3, 32, 32),
    "n_classes": 10,
    "k_folds": 2,
    "checkpoint_name": "checkpoint.pth",
    "seed": 42,
}

run["parameters"] = stringify_unsupported(parameters)

splits = KFold(n_splits=parameters["k_folds"], shuffle=True)

for fold, (train_ids, val_ids) in enumerate(splits.split(dataset)):
    for epoch in range(parameters["epochs"]):
        for i, (x, y) in enumerate(trainloader, 0):
            # Log batch loss
            run[f"fold_{fold}/training/batch/loss"].append(loss)
            # Log batch accuracy
            run[f"fold_{fold}/training/batch/acc"].append(acc)

    # Log model checkpoint
    torch.save(model.state_dict(), f"./{parameters['checkpoint_name']}")
    run[f"fold_{fold}/checkpoint"].upload(parameters["checkpoint_name"])

run["global/metrics/train/mean_acc"] = mean(epoch_acc_list)
run["global/metrics/train/mean_loss"] = mean(epoch_loss_list)
  1. We recommend saving your API token and project name as environment variables.

    If needed, you can pass them as arguments when initializing Neptune:

    neptune.init_run(
        project="workspace-name/project-name",
        api_token="YourNeptuneApiToken",
    )
    
cross-validation_example.py
from statistics import mean

import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import KFold
from torch.utils.data import DataLoader, SubsetRandomSampler
from tqdm.auto import tqdm, trange
from torchvision import datasets, transforms
from functools import reduce

import neptune
from neptune.utils import stringify_unsupported

# Step 1: Create a Neptune run
run = neptune.init_run() # (1)!

# Step 2: Log config and hyperparameters
parameters = {
    "epochs": 1,
    "learning_rate": 1e-2,
    "batch_size": 10,
    "image_size": (3, 32, 32),
    "n_classes": 10,
    "k_folds": 2,
    "checkpoint_name": "checkpoint.pth",
    "dataset_size": 1000,
    "seed": 42,
}

# Log hyperparameters
run["parameters"] = stringify_unsupported(parameters)

image_size = reduce(lambda x, y: x * y, parameters["image_size"])

# Model
class BaseModel(nn.Module):
    def __init__(self, input_sz, hidden_dim, n_classes):
        super(BaseModel, self).__init__()
        self.main = nn.Sequential(
            nn.Linear(input_sz, hidden_dim * 2),
            nn.ReLU(),
            nn.Linear(hidden_dim * 2, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim // 2),
            nn.ReLU(),
            nn.Linear(hidden_dim // 2, n_classes),
        )

    def forward(self, input):
        x = input.view(-1, image_size)
        return self.main(x)

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
run["parameters/device"] = str(device)  # Log to run parameters

# Seed
torch.manual_seed(parameters["seed"])

model = BaseModel(
    image_size,
    image_size,
    parameters["n_classes"],
).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=parameters["learning_rate"])

data_tfms = {
    "train": transforms.Compose(
        [
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
        ]
    )
}

dataset = datasets.FakeData(
    size=parameters["dataset_size"],
    image_size=parameters["image_size"],
    num_classes=parameters["n_classes"],
    transform=data_tfms["train"],
)

# Log dataset details
run["dataset/transforms"] = stringify_unsupported(data_tfms)
run["dataset/size"] = parameters["dataset_size"]

# Training loop

splits = KFold(n_splits=parameters["k_folds"], shuffle=True)
epoch_acc_list, epoch_loss_list = [], []

# Step 3: Log losses and metrics per fold
for fold, (train_ids, _) in tqdm(enumerate(splits.split(dataset))):
    train_sampler = SubsetRandomSampler(train_ids)
    train_loader = DataLoader(
        dataset, batch_size=parameters["batch_size"], sampler=train_sampler
    )
    for _ in trange(parameters["epochs"]):
        epoch_acc, epoch_loss = 0, 0.0
        for x, y in train_loader:
            x, y = x.to(device), y.to(device)
            optimizer.zero_grad()
            outputs = model.forward(x)
            _, preds = torch.max(outputs, 1)
            loss = criterion(outputs, y)
            acc = (torch.sum(preds == y.data)) / len(x)

            # Log batch loss and acc
            run[f"fold_{fold}/training/batch/loss"].append(loss)
            run[f"fold_{fold}/training/batch/acc"].append(acc)

            loss.backward()
            optimizer.step()

        epoch_acc += torch.sum(preds == y.data).item()
        epoch_loss += loss.item() * x.size(0)

    epoch_acc_list.append((epoch_acc / len(train_loader.sampler)) * 100)
    epoch_loss_list.append(epoch_loss / len(train_loader.sampler))

    # Log model checkpoint
    torch.save(model.state_dict(), f"./{parameters['checkpoint_name']}")
    run[f"fold_{fold}/checkpoint"].upload(parameters["checkpoint_name"])
    run.sync()

# Log mean of metrics across all folds
run["results/metrics/train/mean_acc"] = mean(epoch_acc_list)
run["results/metrics/train/mean_loss"] = mean(epoch_loss_list)

run.stop()
  1. We recommend saving your API token and project name as environment variables.

    If needed, you can pass them as arguments when initializing Neptune:

    neptune.init_run(
        project="workspace-name/project-name",
        api_token="YourNeptuneApiToken",
    )
    

Run the script#

After you execute the Python script or notebook cell, you should see a Neptune link printed to the console output.

Sample output

[neptune] [info ] Neptune initialized. Open in the app: https://app.neptune.ai/workspace/project/e/RUN-1

Follow the link to open the run in Neptune.

If Neptune can't find your project name or API token

As a best practice, you should save your Neptune API token and project name as environment variables:

export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8"
export NEPTUNE_PROJECT="ml-team/classification"

Alternatively, you can pass the information when using a function that takes api_token and project as arguments:

run = neptune.init_run(
    api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8", # (1)!
    project="ml-team/classification", # (2)!
)
  1. In the bottom-left corner, expand the user menu and select Get my API token.
  2. You can copy the path from the project details ( Details & privacy).

If you haven't registered, you can log anonymously to a public project:

api_token=neptune.ANONYMOUS_API_TOKEN
project="common/quickstarts"

Make sure not to publish sensitive data through your code!

Analyze cross-validation results#

In the All metadata section of each run, you can browse the cross-validation results both per fold and on job level.

  • To view global scores, navigate to the "global" namespace.
  • To analyze the metrics per fold, navigate through each fold namespace.

See example in Neptune  Code examples 

You can also create a custom dashboard, where you can combine and overlay the metrics from different namespaces according to your needs.