Skip to content

Tracking hyperparameter optimization jobs with Neptune#

Open in Colab

When optimizing or tuning hyperparameters, you can use Neptune to track the metadata from the study as well as each trial.

In this guide, we'll show you how to configure Neptune for your HPO job in two ways:

  • By logging the metadata from all trials to the same Neptune run
  • By creating a separate Neptune run for each trial

Integration tip

Neptune integrates directly with Optuna, a hyperparameter optimization framework.

For a detailed guide, see Optuna integration guide.

See example in Neptune  Code examples 

Before you start#

  • Sign up at neptune.ai/register.
  • Create a project for storing your metadata.
  • Install Neptune:

    pip install neptune
    
    conda install -c conda-forge neptune
    
    Installing through Anaconda Navigator

    To find neptune, you may need to update your channels and index.

    1. In the Navigator, select Environments.
    2. In the package view, click Channels.
    3. Click Add..., enter conda-forge, and click Update channels.
    4. In the package view, click Update index... and wait until the update is complete. This can take several minutes.
    5. You should now be able to search for neptune.

    Note: The displayed version may be outdated. The latest version of the package will be installed.

    Note: On Bioconda, there is a "neptune" package available which is not the neptune.ai client library. Make sure to specify the "conda-forge" channel when installing neptune.ai.

    Passing your Neptune credentials

    Once you've registered and created a project, set your Neptune API token and full project name to the NEPTUNE_API_TOKEN and NEPTUNE_PROJECT environment variables, respectively.

    export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...6Lc"
    

    To find your API token: In the bottom-left corner of the Neptune app, expand the user menu and select Get my API token.

    export NEPTUNE_PROJECT="ml-team/classification"
    

    Your full project name has the form workspace-name/project-name. You can copy it from the project settings: Click the menu in the top-right → Edit project details.

    On Windows, navigate to SettingsEdit the system environment variables, or enter the following in Command Prompt: setx SOME_NEPTUNE_VARIABLE 'some-value'


    While it's not recommended especially for the API token, you can also pass your credentials in the code when initializing Neptune.

    run = neptune.init_run(
        project="ml-team/classification",  # your full project name here
        api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh...3Kb8",  # your API token here
    )
    

    For more help, see Set Neptune credentials.

To follow the guide, you'll additionally need to have torch, torchvision, and tqdm installed:

pip install -U torch torchvision tqdm
conda install -c conda-forge torch torchvision tqdm

Setting up the training script#

In this example, we'll set up a model training script with PyTorch.

  1. Import the needed libraries:

    import neptune
    import numpy as np
    import torch
    import torch.nn as nn
    import torch.optim as optim
    from torchvision import datasets, transforms
    from tqdm.auto import trange
    from functools import reduce
    from neptune.utils import stringify_unsupported
    
  2. Define hyperparameters and the search space:

    parameters = {
        "batch_size": 128,
        "epochs": 1,
        "input_size": (3, 32, 32),
        "n_classes": 10,
        "dataset_size": 1000,
        "model_filename": "basemodel",
        "device": torch.device(
            "cuda:0" if torch.cuda.is_available() else "cpu"
        ),
    }
    
    input_size = reduce(lambda x, y: x * y, parameters["input_size"])
    
    learning_rates = [1e-4, 1e-3, 1e-2]  # learning rate choices
    
  3. Set up the model:

    class BaseModel(nn.Module):
        def __init__(self, input_size, hidden_dim, n_classes):
            super(BaseModel, self).__init__()
            self.main = nn.Sequential(
                nn.Linear(input_size, hidden_dim * 2),
                nn.ReLU(),
                nn.Linear(hidden_dim * 2, hidden_dim),
                nn.ReLU(),
                nn.Linear(hidden_dim, hidden_dim // 2),
                nn.ReLU(),
                nn.Linear(hidden_dim // 2, n_classes),
            )
            self.input_size = input_size
    
        def forward(self, input):
            x = input.view(-1, self.input_size)
            return self.main(x)
    
    model = BaseModel(
        input_size,
        input_size,
        parameters["n_classes"],
    ).to(parameters["device"])
    criterion = nn.CrossEntropyLoss()
    
  4. Set up datasets:

    data_tfms = {
        "train": transforms.Compose(
            [
                transforms.RandomHorizontalFlip(),
                transforms.ToTensor(),
                transforms.Normalize(
                    [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
                ),
            ]
        )
    }
    
    trainset = datasets.FakeData(
        size=parameters["dataset_size"],
        image_size=parameters["input_size"],
        num_classes=parameters["n_classes"],
        transform=data_tfms["train"],
    )
    trainloader = torch.utils.data.DataLoader(
        trainset,
        batch_size=parameters["batch_size"],
        shuffle=True,
        num_workers=0,
    )
    

Next, set up the training loop depending on your approach:

Logging all trials to the same run#

In this approach, we'll create a global Neptune run for logging metadata (such as metrics) across the trials.

  1. Initialize a Neptune run:

    run = neptune.init_run(
        tags=["sweep-level"], # (1)!
    )
    
    1. To identify a run that contains metadata from multiple trials.

    If you haven't saved your credentials as environment variables, you can pass them as arguments when initializing Neptune:

    neptune.init_run(
        project="workspace-name/project-name",
        api_token="YourNeptuneApiTokenHere",
        tags=["sweep-level"],
    )
    
    How do I save my credentials as environment variables?

    Set your Neptune API token and full project name to the NEPTUNE_API_TOKEN and NEPTUNE_PROJECT environment variables, respectively.

    export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
    
    export NEPTUNE_PROJECT="ml-team/classification"
    
    export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
    
    export NEPTUNE_PROJECT="ml-team/classification"
    
    setx NEPTUNE_API_TOKEN "h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
    
    setx NEPTUNE_PROJECT "ml-team/classification"
    

    You can also navigate to SettingsEdit the system environment variables and add the variables there.

    %env NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
    
    %env NEPTUNE_PROJECT="ml-team/classification"
    

    To find your credentials:

    • API token: In the bottom-left corner of the Neptune app, expand your user menu and select Get your API token. If you need the token of a service account, go to the workspace or project settings and enter the Service accounts settings.
    • Project name: Your full project name has the form workspace-name/project-name. You can copy it from the project menu ( Edit project details).

    If you're working in Google Colab, you can set your credentials with the os and getpass libraries:

    import os
    from getpass import getpass
    os.environ["NEPTUNE_API_TOKEN"] = getpass("Enter your Neptune API token: ")
    os.environ["NEPTUNE_PROJECT"] = "workspace-name/project-name"
    
    Haven't registered yet?

    You can also try Neptune anonymously by passing the following credentials:

    run = neptune.init_run(
        api_token=neptune.ANONYMOUS_API_TOKEN,
        project="common/pytorch-integration",
        tags=["sweep-level"],
    )
    
  2. Set up the training loop:

    for (i, lr) in enumerate(learning_rates):
        # Log hyperparameters
        run[f"trials/{i}/parms"] = stringify_unsupported(parameters)
        run[f"trials/{i}/parms/lr"] = lr
    
        optimizer = optim.SGD(model.parameters(), lr=lr)
        for _ in trange(parameters["epochs"]):
            for (x, y) in trainloader:
    
                x, y = x.to(parameters["device"]), y.to(parameters["device"])
                optimizer.zero_grad()
                outputs = model.forward(x)
                loss = criterion(outputs, y)
    
                _, preds = torch.max(outputs, 1)
                acc = (torch.sum(preds == y.data)) / len(x)
    
                # Log metrics
                run[f"trials/{i}/training/batch/loss"].append(loss)
                run[f"trials/{i}/training/batch/acc"].append(acc)
    
                loss.backward()
                optimizer.step()
    
  3. To stop the connection to Neptune and sync all data, call the stop() method:

    run.stop()
    

Analyzing results in Neptune#

When browsing the metadata of the sweep level run, you can see a namespace called trials. It contains the metadata (parms and training metrics) logged for each trial.

See example in Neptune 

Logging each trial to a separate run#

In this approach, we'll create local Neptune runs that log metadata from each trial separately.

After setting up the training script, add the following training loop:

for (i, lr) in enumerate(learning_rates):
    # Create a new run
    run = neptune.init_run(
        name=f"trial-{i}",
        tags=["trial-level"], # (1)!
    )

    # Log hyperparameters
    run["parms"] = stringify_unsupported(parameters)
    run["parms/lr"] = lr

    for _ in trange(parameters["epochs"]):
        for (x, y) in trainloader:

            x, y = x.to(parameters["device"]), y.to(parameters["device"])
            optimizer.zero_grad()
            outputs = model.forward(x)
            loss = criterion(outputs, y)

            _, preds = torch.max(outputs, 1)
            acc = (torch.sum(preds == y.data)) / len(x)

            # Log metrics
            run["training/batch/loss"].append(loss)
            run["training/batch/acc"].append(acc)

            loss.backward()
            optimizer.step()

    # Important - stop each run inside the loop
    run.stop()
  1. To indicate that the run only contains results from a single trial.
How do I save my credentials as environment variables?

Set your Neptune API token and full project name to the NEPTUNE_API_TOKEN and NEPTUNE_PROJECT environment variables, respectively.

export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
export NEPTUNE_PROJECT="ml-team/classification"
export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
export NEPTUNE_PROJECT="ml-team/classification"
setx NEPTUNE_API_TOKEN "h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
setx NEPTUNE_PROJECT "ml-team/classification"

You can also navigate to SettingsEdit the system environment variables and add the variables there.

%env NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...ifQ=="
%env NEPTUNE_PROJECT="ml-team/classification"

To find your credentials:

  • API token: In the bottom-left corner of the Neptune app, expand your user menu and select Get your API token. If you need the token of a service account, go to the workspace or project settings and enter the Service accounts settings.
  • Project name: Your full project name has the form workspace-name/project-name. You can copy it from the project menu ( Edit project details).

If you're working in Google Colab, you can set your credentials with the os and getpass libraries:

import os
from getpass import getpass
os.environ["NEPTUNE_API_TOKEN"] = getpass("Enter your Neptune API token: ")
os.environ["NEPTUNE_PROJECT"] = "workspace-name/project-name"

Analyzing results in Neptune#

Click on the run to browse the metadata.

You can see that the run only contains metadata (parms and training metrics) from a single trial.

See example in Neptune 

Related