Skip to content

Tracking and organizing model training runs#

Open in Colab

Need a more detailed walkthrough that starts from installation? The Neptune tutorial has you covered.

This example walks you through the ways you can track and organize runs in Neptune:

  • Keep track of code, data, environment, and parameters.
  • Log results, like evaluation metrics and model files.
  • Work with tags.
  • Create a custom view of the runs table and save it for later.

Before you start#

  • Sign up at neptune.ai/register.
  • Create a project for storing your metadata.
  • Install Neptune:

    pip install neptune
    
    conda install -c conda-forge neptune
    
    Installing through Anaconda Navigator

    To find neptune, you may need to update your channels and index.

    1. In the Navigator, select Environments.
    2. In the package view, click Channels.
    3. Click Add..., enter conda-forge, and click Update channels.
    4. In the package view, click Update index... and wait until the update is complete. This can take several minutes.
    5. You should now be able to search for neptune.

    Note: The displayed version may be outdated. The latest version of the package will be installed.

    Note: On Bioconda, there is a "neptune" package available which is not the neptune.ai client library. Make sure to specify the "conda-forge" channel when installing neptune.ai.

    Passing your Neptune credentials

    Once you've registered and created a project, set your Neptune API token and full project name to the NEPTUNE_API_TOKEN and NEPTUNE_PROJECT environment variables, respectively.

    export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...6Lc"
    

    To find your API token: In the bottom-left corner of the Neptune app, expand the user menu and select Get my API token.

    export NEPTUNE_PROJECT="ml-team/classification"
    

    Your full project name has the form workspace-name/project-name. You can copy it from the project settings: Click the menu in the top-right → Edit project details.

    On Windows, navigate to SettingsEdit the system environment variables, or enter the following in Command Prompt: setx SOME_NEPTUNE_VARIABLE 'some-value'


    While it's not recommended especially for the API token, you can also pass your credentials in the code when initializing Neptune.

    run = neptune.init_run(
        project="ml-team/classification",  # your full project name here
        api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh...3Kb8",  # your API token here
    )
    

    For more help, see Set Neptune credentials.

  • Have the scikit-learn and joblib Python libraries installed.

    What if I don't use scikit-learn?

    No worries, we're just using it for demonstration purposes. You can use any framework you like, and Neptune has intregrations with various popular frameworks. For details, see the Integrations tab.

Create a basic training script#

As an example, we'll use a script that trains a scikit-learn model on the wine dataset.

Create a file train.py and copy the script below.

train.py
from sklearn.datasets import load_wine
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from joblib import dump

data = load_wine()
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.4, random_state=1234
)

params = {
    "n_estimators": 10,
    "max_depth": 3,
    "min_samples_leaf": 1,
    "min_samples_split": 2,
    "max_features": 3,
}

clf = RandomForestClassifier(**params)
clf.fit(X_train, y_train)
y_train_pred = clf.predict_proba(X_train)
y_test_pred = clf.predict_proba(X_test)

train_f1 = f1_score(y_train, y_train_pred.argmax(axis=1), average="macro")
test_f1 = f1_score(y_test, y_test_pred.argmax(axis=1), average="macro")
print(f"Train f1:{train_f1} | Test f1:{test_f1}")

dump(clf, "model.pkl")

In your terminal program, run the script to ensure that it works properly.

python train.py

Connect Neptune to your code#

At the top of your script, add the following:

import neptune

run = neptune.init_run() # (1)!
  1. We recommend saving your API token and project name as environment variables.

    If needed, you can pass them as arguments when initializing Neptune:

    neptune.init_run(
        project="workspace-name/project-name",
        api_token="YourNeptuneApiToken",
    )
    
Haven't registered yet?

No problem. You can try Neptune anonymously by logging to a public project with a shared API token:

run = neptune.init_run(api_token=neptune.ANONYMOUS_API_TOKEN, project="common/quickstarts")

This creates a new run in Neptune, to which you can log various types of metadata.

Let's track the parameters, code, and environment by assigning them to the run:

run["parameters"] = params

Neptune captures the contents of the entry-point script by default, but you can specify additional files to snapshot when you initialize the run. It can be helpful if you forget to commit your code changes with Git.

To specify what source code to track, pass a list of files or a regular expression to the source_files argument:

run = neptune.init_run(source_files=["*.py", "requirements.txt"]) # (1)!
  1. When using pattern expansion, such as **/*.py, make sure that your expression does not capture files you don't intend to upload. For example, using * as a pattern will upload all files and directories from the current working directory (cwd).

If you have Git initialized in the execution path of your project, Neptune extracts some information from the .git directory.

Putting it all together, your script should now look like this:

train.py
from sklearn.datasets import load_wine
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from joblib import dump
import neptune

run = neptune.init_run(source_files=["*.py", "requirements.txt"])

data = load_wine()
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.4, random_state=1234
)

params = {
    "n_estimators": 10,
    "max_depth": 3,
    "min_samples_leaf": 1,
    "min_samples_split": 2,
    "max_features": 3,
}

run["parameters"] = params

clf = RandomForestClassifier(**params)
clf.fit(X_train, y_train)
y_train_pred = clf.predict_proba(X_train)
y_test_pred = clf.predict_proba(X_test)

train_f1 = f1_score(y_train, y_train_pred.argmax(axis=1), average="macro")
test_f1 = f1_score(y_test, y_test_pred.argmax(axis=1), average="macro")
print(f"Train f1:{train_f1} | Test f1:{test_f1}")

dump(clf, "model.pkl")

Add tags to organize things#

Tags can help you find runs later, especially if you try a lot of ideas. You can filter by tag when viewing the runs table and querying runs through the API.

Tags are stored in the tags field of the system namespace "sys".

To tag the run through the code, add a set of strings to the "sys/tags" field with the add() method:

run["sys/tags"].add(["run-organization", "me"])

Tip

You can also add and manage tags through the Neptune app.

Log train and evaluation metrics#

Log the scores you want to track by assigning them to fields in the run:

run["train/f1"] = train_f1
run["test/f1"] = test_f1

You can log a series of values to the same field with the append() method:

acc = ...
run["train/loss"].append(acc)

In the code above, each append() call appends a value to the series stored in train/loss. You can view the resulting series as a chart in the Neptune app.

Upload files#

You can upload any file, such as a model file, with the upload() method:

run["model"].upload("my_model.pkl")

Track a few runs with different parameters#

Let's execute some runs with different model configurations.

  1. Change some parameters in the params dictionary:

    params = {
        "n_estimators": 10,
        "max_depth": 3,
        "min_samples_leaf": 1,
        "min_samples_split": 2,
        "max_features": 3,
    }
    
  2. To stop the connection to Neptune and sync all data, call the stop() method:

    run.stop()
    
  3. Execute the script

    python train.py
    
If Neptune can't find your project name or API token

As a best practice, you should save your Neptune API token and project name as environment variables:

export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8"
export NEPTUNE_PROJECT="ml-team/classification"

Alternatively, you can pass the information when using a function that takes api_token and project as arguments:

run = neptune.init_run( # (1)!
    api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8",  # your token here
    project="ml-team/classification",  # your full project name here
)
  1. Also works for init_model(), init_model_version(), init_project(), and integrations that create Neptune runs underneath the hood, such as NeptuneLogger or NeptuneCallback.

  2. API token: In the bottom-left corner, expand the user menu and select Get my API token.

  3. Project name: You can copy the path from the project details ( Edit project details).

If you haven't registered, you can log anonymously to a public project:

api_token=neptune.ANONYMOUS_API_TOKEN
project="common/quickstarts"

Make sure not to publish sensitive data through your code!

Organize results in the Neptune app#

Click the run link that appears in the console output, or open your project in the Neptune app.

Sample output

[neptune] [info ] Neptune initialized. Open in the app: https://app.neptune.ai/workspace/project/e/RUN-1

In the above example, the run ID is RUN-1.

See that everything was logged#

To check that everything was logged correctly, navigate to the following sections:

  • All metadata: Review all logged metadata.
  • monitoring namespace: See hardware utilization charts.
  • parameters namespace: See your parameters.
  • sys namespace: View metadata about your run.

Filter runs by tag#

Navigate back to the runs table.

In the input box above the table, filter the runs by the run-organization tag with the following query: "Tags" + "one of" + "run-organization".

Choose parameter and metric columns to display#

Neptune generally suggests columns if the fields have values that differ between the selected runs.

To add a column manually:

  1. Click Add column above the table.
  2. Select a field from the list, or start typing to match available fields. For example, f1.

Customize column appearance#

To customize the runs table view even further, you can click the settings icon () on any column to set a custom name and color.

Save the custom view#

To save the current view of the runs table for later, click Save as new above the query input box. The saved view retains column customization and row filtering.

To share the view with collaborators, you can copy the link, send it via email, or export the view as CSV.

Tip

Create and save multiple views of the runs table for different use cases or run groups.

See in Neptune  Code examples