Skip to content

Tracking hyperparameter optimization jobs with Neptune#

Open in Colab

When running an HPO job, you can use Neptune to track the metadata from the study as well as each trial.

In this guide, we'll show you how to configure Neptune for your HPO job in two ways:

  • By logging the metadata from all trials to the same Neptune run
  • By creating a separate Neptune run for each trial

Integration tip

Neptune integrates directly with Optuna, a hyperparameter optimization framework.

For a detailed guide, see Working with Optuna.

See example in Neptune  Code examples 

Before you start#

To follow the guide, you'll additionally need to have torch, torchvision, and tqdm installed:

pip install -U neptune-client torch torchvision tqdm

Setting up the training script#

In this example, we'll set up a model training script with PyTorch.

  1. Import the needed libraries:

    import neptune.new as neptune
    import numpy as np
    import torch
    import torch.nn as nn
    import torch.optim as optim
    from torchvision import datasets, transforms
    from tqdm.auto import trange
    import math
    
  2. Define hyperparameters and the search space:

    parameters = {
        "batch_size": 128,
        "epochs": 1,
        "input_size": (3, 32, 32),
        "n_classes": 10,
        "dataset_size": 1000,
        "model_filename": "basemodel",
        "device": torch.device("cuda:0" if torch.cuda.is_available() else "cpu"),
    }
    
    learning_rates = [1e-4, 1e-3, 1e-2]  # learning rate choices
    
  3. Set up the model:

    class BaseModel(nn.Module):
        def __init__(self, input_size, hidden_dim, n_classes):
            super(BaseModel, self).__init__()
            self.main = nn.Sequential(
                nn.Linear(input_size, hidden_dim * 2),
                nn.ReLU(),
                nn.Linear(hidden_dim * 2, hidden_dim),
                nn.ReLU(),
                nn.Linear(hidden_dim, hidden_dim // 2),
                nn.ReLU(),
                nn.Linear(hidden_dim // 2, n_classes),
            )
            self.input_size = input_size
    
        def forward(self, input):
            x = input.view(-1, self.input_size)
            return self.main(x)
    
    model = BaseModel(
        math.prod(parameters["input_size"]),
        math.prod(parameters["input_size"]),
        parameters["n_classes"],
    ).to(parameters["device"])
    criterion = nn.CrossEntropyLoss()
    
  4. Set up datasets:

    data_tfms = {
        "train": transforms.Compose(
            [
                transforms.RandomHorizontalFlip(),
                transforms.ToTensor(),
                transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
            ]
        )
    }
    
    trainset = datasets.FakeData(
        size=parameters["dataset_size"],
        image_size=parameters["input_size"],
        num_classes=parameters["n_classes"],
        transform=data_tfms["train"],
    )
    trainloader = torch.utils.data.DataLoader(
        trainset, batch_size=parameters["batch_size"], shuffle=True, num_workers=0
    )
    

Next, set up the training loop depending on your approach:

Logging all trials to the same run#

In this approach, we'll create a global Neptune run for logging metadata (such as metrics) across the trials.

  1. Initialize a Neptune run:

    run = neptune.init_run(
        tags=["sweep-level"],  # (1)!
    )
    
    1. To identify a run that contains metadata from multiple trials.

    If you haven't saved your credentials as environment variables, you can pass them as arguments when initializing Neptune:

    neptune.init_run(
        project="workspace-name/project-name",
        api_token="YourNeptuneApiTokenHere",
        tags=["sweep-level"],
    )
    
    How do I save my credentials as environment variables?

    Set your Neptune API token and full project name to the NEPTUNE_API_TOKEN and NEPTUNE_PROJECT environment variables, respectively.

    For example:

    export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...6Lc" # (1)!
    
    1. On Windows, the command is set instead of export.
    export NEPTUNE_PROJECT="ml-team/classification" # (1)!
    
    1. On Windows, the command is set instead of export.

    Finding your credentials:

    • API token: In the top-right corner of the Neptune app, click your avatar and select Get your API token.
    • Project: Your full project name has the form workspace-name/project-name. To copy the name, navigate to your projectSettingsProperties.

    If you're working in Colab, you can set your credentials with the os and getpass libraries:

    import os
    from getpass import getpass
    os.environ["NEPTUNE_API_TOKEN"] = getpass("Enter your Neptune API token: ")
    os.environ["NEPTUNE_PROJECT"] = "workspace-name/project-name"
    
    Haven't registered yet?

    You can also try Neptune anonymously by passing the following credentials:

    run = neptune.init_run(
        api_token=neptune.ANONYMOUS_API_TOKEN,
        project="common/pytorch-integration",
        tags=["sweep-level"],
    )
    
  2. Set up the training loop:

    for (i, lr) in enumerate(learning_rates):
        # Log hyperparameters
        run[f"trials/{i}/parms"] = parameters
        run[f"trials/{i}/parms/lr"] = lr
    
        optimizer = optim.SGD(model.parameters(), lr=lr)
        for _ in trange(parameters["epochs"]):
            for (x, y) in trainloader:
    
                x, y = x.to(parameters["device"]), y.to(parameters["device"])
                optimizer.zero_grad()
                outputs = model.forward(x)
                loss = criterion(outputs, y)
    
                _, preds = torch.max(outputs, 1)
                acc = (torch.sum(preds == y.data)) / len(x)
    
                # Log metrics
                run[f"trials/{i}/training/batch/loss"].append(loss)
                run[f"trials/{i}/training/batch/acc"].append(acc)
    
                loss.backward()
                optimizer.step()
    
  3. To stop the connection to Neptune and sync all data, call the stop() method:

    run.stop()
    
    Note for interactive sessions

    Always call stop() in interactive environments, such as a Python interpreter or Jupyter notebook. The connection to Neptune is not stopped when the cell has finished executing, but rather when the entire notebook stops.

    If you're running a script, the connection is stopped automatically when the script finishes executing. However, it's a best practice to call stop() when the connection is no longer needed.

Analyzing results in Neptune#

When browsing the metadata of the sweep-level run, you can see a namespace called trials. It contains the metadata (params and training metrics) logged for each trial.

See example in Neptune 

Logging each trial to a separate run#

In this approach, we'll create local Neptune runs that log metadata from each trial separately.

After Setting up the training script, add the following training loop:

for (i, lr) in enumerate(learning_rates):
    # Create a new run
    run = neptune.init_run(
        name=f"trial-{i}",
        tags=["trial-level"],  # (1)!
    )

    # Log hyperparameters
    run["parms"] = parameters
    run["parms/lr"] = lr

    for _ in trange(parameters["epochs"]):
        for (x, y) in trainloader:

            x, y = x.to(parameters["device"]), y.to(parameters["device"])
            optimizer.zero_grad()
            outputs = model.forward(x)
            loss = criterion(outputs, y)

            _, preds = torch.max(outputs, 1)
            acc = (torch.sum(preds == y.data)) / len(x)

            # Log metrics
            run["training/batch/loss"].append(loss)
            run["training/batch/acc"].append(acc)

            loss.backward()
            optimizer.step()

    # Important - stop each run inside the loop
    run.stop()
  1. To indicate that the run only contains results from a single trial.
How do I save my credentials as environment variables?

Set your Neptune API token and full project name to the NEPTUNE_API_TOKEN and NEPTUNE_PROJECT environment variables, respectively.

For example:

export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...6Lc" # (1)!
  1. On Windows, the command is set instead of export.
export NEPTUNE_PROJECT="ml-team/classification" # (1)!
  1. On Windows, the command is set instead of export.

Finding your credentials:

  • API token: In the top-right corner of the Neptune app, click your avatar and select Get your API token.
  • Project: Your full project name has the form workspace-name/project-name. To copy the name, navigate to your projectSettingsProperties.

If you're working in Colab, you can set your credentials with the os and getpass libraries:

import os
from getpass import getpass
os.environ["NEPTUNE_API_TOKEN"] = getpass("Enter your Neptune API token: ")
os.environ["NEPTUNE_PROJECT"] = "workspace-name/project-name"

Analyzing results in Neptune#

Click on the run to browse the metadata.

You can see that the run only contains metadata (params and training metrics) from a single trial.

See example in Neptune 

Related