Tracking and organizing model training runs#
Need a more detailed walkthrough that starts from installation? The Neptune tutorial has you covered.
This example walks you through the ways you can track and organize runs in Neptune:
- Keep track of code, data, environment, and parameters.
- Log results, like evaluation metrics and model files.
- Work with tags.
- Create a custom view of the experiments table and save it for later.
Before you start#
- Sign up at neptune.ai/register.
- Create a project for storing your metadata.
-
Install Neptune:
Passing your Neptune credentials
Once you've registered and created a project, set your Neptune API token and full project name to the
NEPTUNE_API_TOKEN
andNEPTUNE_PROJECT
environment variables, respectively.To find your API token: In the bottom-left corner of the Neptune app, expand the user menu and select Get my API token.
Your full project name has the form
workspace-name/project-name
. You can copy it from the project settings: Click the menu in the top-right → Details & privacy.On Windows, navigate to Settings → Edit the system environment variables, or enter the following in Command Prompt:
setx SOME_NEPTUNE_VARIABLE 'some-value'
While it's not recommended especially for the API token, you can also pass your credentials in the code when initializing Neptune.
run = neptune.init_run( project="ml-team/classification", # your full project name here api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh...3Kb8", # your API token here )
For more help, see Set Neptune credentials.
-
Have the joblib and scikit-learn Python libraries installed.
What if I don't use scikit-learn?
No worries, we're just using it for demonstration purposes. You can use any framework you like, and Neptune has intregrations with various popular frameworks. For details, see the Integrations tab.
Create a basic training script#
As an example, we'll use a script that trains a scikit-learn model on the wine dataset.
Create a file train.py
and copy the script below.
from sklearn.datasets import load_wine
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from joblib import dump
data = load_wine()
X_train, X_test, y_train, y_test = train_test_split(
data.data, data.target, test_size=0.4, random_state=1234
)
params = {
"n_estimators": 10,
"max_depth": 3,
"min_samples_leaf": 1,
"min_samples_split": 2,
"max_features": 3,
}
clf = RandomForestClassifier(**params)
clf.fit(X_train, y_train)
y_train_pred = clf.predict_proba(X_train)
y_test_pred = clf.predict_proba(X_test)
train_f1 = f1_score(y_train, y_train_pred.argmax(axis=1), average="macro")
test_f1 = f1_score(y_test, y_test_pred.argmax(axis=1), average="macro")
print(f"Train f1:{train_f1} | Test f1:{test_f1}")
dump(clf, "model.pkl")
In your terminal program, run the script to ensure that it works properly.
Connect Neptune to your code#
At the top of your script, add the following:
-
We recommend saving your API token and project name as environment variables.
If needed, you can pass them as arguments when initializing Neptune:
Haven't registered yet?
No problem. You can try Neptune anonymously by logging to a public project with a shared API token:
This creates a new run in Neptune, to which you can log various types of metadata.
Let's track the parameters, code, and environment by assigning them to the run
:
Neptune captures the contents of the entry-point script by default, but you can specify additional files to snapshot when you initialize the run. It can be helpful if you forget to commit your code changes with Git.
To specify what source code to track, pass a list of files or a regular expression to the source_files
argument:
- When using pattern expansion, such as
**/*.py
, make sure that your expression does not capture files you don't intend to upload. For example, using*
as a pattern will upload all files and directories from the current working directory (cwd
).
If you have Git initialized in the execution path of your project, Neptune extracts some information from the .git
directory.
Putting it all together, your script should now look like this:
from sklearn.datasets import load_wine
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
from joblib import dump
import neptune
run = neptune.init_run(source_files=["*.py", "requirements.txt"])
data = load_wine()
X_train, X_test, y_train, y_test = train_test_split(
data.data, data.target, test_size=0.4, random_state=1234
)
params = {
"n_estimators": 10,
"max_depth": 3,
"min_samples_leaf": 1,
"min_samples_split": 2,
"max_features": 3,
}
run["parameters"] = params
clf = RandomForestClassifier(**params)
clf.fit(X_train, y_train)
y_train_pred = clf.predict_proba(X_train)
y_test_pred = clf.predict_proba(X_test)
train_f1 = f1_score(y_train, y_train_pred.argmax(axis=1), average="macro")
test_f1 = f1_score(y_test, y_test_pred.argmax(axis=1), average="macro")
print(f"Train f1:{train_f1} | Test f1:{test_f1}")
dump(clf, "model.pkl")
Add tags to organize things#
Tags can help you find runs later, especially if you try a lot of ideas. You can filter by tag when viewing the experiments table and querying runs through the API.
Tags are stored in the tags
field of the system namespace "sys".
To tag the run through the code, add a set of strings to the "sys/tags"
field with the add()
method:
Tip
You can also add and manage tags through the Neptune app.
Log train and evaluation metrics#
Log the scores you want to track by assigning them to fields in the run:
You can log a series of values to the same field with the append()
method:
In the code above, each append()
call appends a value to the series stored in train/loss
. You can view the resulting series as a chart in the Neptune app.
Batching tip
To optimize batching when creating multiple series fields in a single statement, iterate through the fields in the outer loop and the values in the inner loop:
Upload files#
You can upload any file, such as a model file, with the upload()
method:
Track a few runs with different parameters#
Let's execute some runs with different model configurations.
-
Change some parameters in the
params
dictionary: -
To stop the connection to Neptune and sync all data, call the
stop()
method: -
Execute the script
If Neptune can't find your project name or API token
As a best practice, you should save your Neptune API token and project name as environment variables:
Alternatively, you can pass the information when using a function that takes api_token
and project
as arguments:
run = neptune.init_run(
api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8", # (1)!
project="ml-team/classification", # (2)!
)
- In the bottom-left corner, expand the user menu and select Get my API token.
- You can copy the path from the project details ( → Details & privacy).
If you haven't registered, you can log anonymously to a public project:
Make sure not to publish sensitive data through your code!
Organize results in the Neptune app#
Click the run link that appears in the console output, or open your project in the Neptune app.
Sample output
[neptune] [info ] Neptune initialized. Open in the app:
https://app.neptune.ai/workspace/project/e/RUN-1
See that everything was logged#
To check that everything was logged correctly, navigate to the following sections:
- All metadata: Review all logged metadata.
monitoring
namespace: See hardware utilization charts.parameters
namespace: See your parameters.sys
namespace: View metadata about your run.
Filter runs by tag#
Navigate back to the experiments table.
In the input box above the table, filter the runs by the run-organization
tag with the following query: "Tags" + "one of" + "run-organization".
Choose parameter and metric columns to display#
Neptune generally suggests columns if the fields have values that differ between the selected runs.
To add a column manually:
- Click Add column above the table.
- Select a field from the list, or start typing to match available fields. For example,
f1
.
Customize column appearance#
To customize the experiments table view even further, you can click the settings icon ( ) on any column to set a custom name and color.
Save the custom view#
To save the current view of the experiments table for later, click Save as new above the query input box. The saved view retains column customization and row filtering.
To share the view with collaborators, you can copy the link, send it via email, or export the view as CSV.
Tip
Create and save multiple views of the experiments table for different use cases or run groups.