Organize ML experiments


This guide will show you how to:
  • Keep track of code, data, environment, and parameters
  • Log results like evaluation metrics and model files
  • Find runs in the dashboard with tags
  • Organize runs in a dashboard view and save it for later

Before you start

Make sure you meet the following prerequisites before starting:
You can run this how-to on Google Colab with zero setup. Just click on the Run in Google Colab link on the top of the page.

Step 1: Create a basic training script

As an example, I’ll use a script that trains a scikit-learn model on the wine dataset.
You don’t have to use scikit-learn to track your training runs with Neptune. I am using it as an easy-to-follow example. There are links to integrations with other ML frameworks and useful articles in the text.

Create a file and copy the script below.
from joblib import dump
from sklearn.datasets import load_wine
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split
data = load_wine()
X_train, X_test, y_train, y_test = train_test_split(,, test_size=0.4, random_state=1234
params = {
"n_estimators": 10,
"max_depth": 3,
"min_samples_leaf": 1,
"min_samples_split": 2,
"max_features": 3,
clf = RandomForestClassifier(**params), y_train)
y_train_pred = clf.predict_proba(X_train)
y_test_pred = clf.predict_proba(X_test)
train_f1 = f1_score(y_train, y_train_pred.argmax(axis=1), average="macro")
test_f1 = f1_score(y_test, y_test_pred.argmax(axis=1), average="macro")
print(f"Train f1:{train_f1} | Test f1:{test_f1}")
dump(clf, "model.pkl")

Run training to make sure that it works correctly.


Step 2: Connect Neptune to your script

At the top of your script add
import as neptune
run = neptune.init(
This creates a new “run” in Neptune to which you can log metadata.
You need to tell Neptune who you are and where you want to log things. To do that you specify:
  • project=my_workspace/my_project: your workspace name and project name,
  • api_token=YOUR_API_TOKEN : your Neptune API token.
If you configured your Neptune API token correctly, as described here, you can skip the api_token argument.

Step 3: Add parameter, code, and environment tracking

Add parameters tracking
run["parameters"] = params
You can add code and environment tracking at run creation
run = neptune.init(source_files=["*.py", "requirements.txt"])
You can log source code to Neptune with every run. It can save you if you forget to commit your code changes to git.
To do it pass a list of files or regular expression to source_files argument.
If you start the run from a directory that is a part of the git repo, Neptune will automatically find the .git directory and log some information from it:
  • status if the repo has uncommitted changed (dirty flag),
  • commit information (id, message, author, date),
  • branch,
  • remote address to your run,
  • git checkout command with commit.
Putting it all together your neptune.init should look like this:
import as neptune
run = neptune.init(
source_files=["*.py", "requirements.txt"],

Step 4: Add tags to organize things

Runs can be viewed as dictionary-like structures - namespaces - that you can define in your code. You can apply hierarchical structure to your metadata that will be reflected in the UI as well. Thanks to this you can easily organize your metadata in a way you feel is most convenient.
There is one special namespace: system namespace, denoted sys. You can use it to add a name and tags to the run.
Pass a list of strings to 'sys/tags' namespace:
run["sys/tags"].add(["run-organization", "me"]) # organize things
It will help you find runs later, especially if you try a lot of ideas.

Step 5: Add logging of train and evaluation metrics

run["train/f1"] = train_f1
run["test/f1"] = test_f1
Log all the scores you care about in the same way as above. There could be as many as you like.
You can log multiple values to the same metric:
acc = ...
When you do that a chart will be created automatically.

Step 6: Add logging of model files

Log your model with the .upload method. Just pass the path to the file you want to log to Neptune.

Step 7: Make a few runs with different parameters

Let’s execute some runs with different model configurations.

Change parameters in the params dictionary

params = {
"n_estimators": 12,
"max_depth": 5,
"min_samples_leaf": 2,
"min_samples_split": 3,
"max_features": 5,

Execute a run


Step 8: Stop logging

Once you are done logging, you should stop tracking the run using the stop() method. This is needed only while logging from a notebook environment. While logging through a script, Neptune automatically stops tracking once the script has completed execution.

Step 9: Go to Neptune UI

Click on one of the links created when you run the script or go directly to the app.
If you are logging things to the public project common/quickstarts you can just follow this link.

Step 10: See that everything got logged

Go to one of the runs you made and see that you logged things correctly:
  • Click on the run link or one of the rows in the runs table in the UI
  • Go to the Parameters section to see your parameters
  • Go to the Monitoring to see hardware utilization charts
  • Go to theAll metadata to review all logged metadata

Step 11: Filter runs by tag

Go to the runs space and filter by the run-organization tag
Neptune should filter all those runs for you.

Step 12: Choose parameter and metric columns you want to see

Use the Add column button to choose the columns for the runs table:
  • Click on Add column,
  • Type metadata name of interest, for example test_f1,
  • Click on test_f1 to add it.
You can also use the suggested columns which show you the columns with values that differ between selected runs. Just click on the"+"to add it to your runs table.

Step 13: Save the view of the runs table

You can save the current view of the runs table for later:
  • Click on the Save as new
Both the columns and the filtering on rows will be saved as a view.
Create and save multiple views of the runs table for different use cases or run groups.

What’s next?