Skip to content

Monitoring model training live#

Open in Colab

Need a more detailed walkthrough that starts from installation? The Neptune tutorial has you covered.

This example walks you through basic monitoring of your model training process:

  • Looking at learning curves for loss and accuracy
  • Monitoring hardware consumption during training across CPU, GPU (NVIDIA only), and memory

Before you start#

  • Sign up at neptune.ai/register.
  • Create a project for storing your metadata.
  • Install Neptune:

    pip install neptune
    
    Passing your Neptune credentials

    Once you've registered and created a project, set your Neptune API token and full project name to the NEPTUNE_API_TOKEN and NEPTUNE_PROJECT environment variables, respectively.

    export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...6Lc"
    

    To find your API token: In the bottom-left corner of the Neptune app, expand the user menu and select Get my API token.

    export NEPTUNE_PROJECT="ml-team/classification"
    

    Your full project name has the form workspace-name/project-name. You can copy it from the project settings: Click the menu in the top-right → Details & privacy.

    On Windows, navigate to SettingsEdit the system environment variables, or enter the following in Command Prompt: setx SOME_NEPTUNE_VARIABLE 'some-value'


    While it's not recommended especially for the API token, you can also pass your credentials in the code when initializing Neptune.

    run = neptune.init_run(
        project="ml-team/classification",  # your full project name here
        api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh...3Kb8",  # your API token here
    )
    

    For more help, see Set Neptune credentials.

  • Have TensorFlow 2.X with Keras installed.

What if I don't use Keras?

No worries, we're just using it for demonstration purposes. You can use any framework you like, and Neptune has intregrations with various popular frameworks. For details, see the Integrations tab.

Create a basic training script#

Create a file train.py and copy the script below.

train.py
from tensorflow import keras

params = {
    "epoch_nr": 100,
    "batch_size": 256,
    "lr": 0.005,
    "momentum": 0.4,
    "use_nesterov": True,
    "unit_nr": 256,
    "dropout": 0.05,
}

mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

model = keras.models.Sequential(
    [
        keras.layers.Flatten(),
        keras.layers.Dense(
            params["unit_nr"],
            activation=keras.activations.relu,
        ),
        keras.layers.Dropout(params["dropout"]),
        keras.layers.Dense(10, activation=keras.activations.softmax),
    ]
)

optimizer = keras.optimizers.SGD(
    learning_rate=params["lr"],
    momentum=params["momentum"],
    nesterov=params["use_nesterov"],
)

model.compile(
    optimizer=optimizer,
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

model.fit(
    x_train,
    y_train,
    epochs=params["epoch_nr"],
    batch_size=params["batch_size"],
)

In your terminal program, run the script to ensure that it works properly.

python train.py

Connect Neptune to your code#

At the top of your script, add the following:

import neptune

run = neptune.init_run() # (1)!
  1. We recommend saving your API token and project name as environment variables.

    If needed, you can pass them as arguments when initializing Neptune:

    neptune.init_run(
        project="workspace-name/project-name",
        api_token="YourNeptuneApiToken",
    )
    
Haven't registered yet?

No problem. You can try Neptune anonymously by logging to a public project with a shared API token:

run = neptune.init_run(api_token=neptune.ANONYMOUS_API_TOKEN, project="common/quickstarts")

This creates a new run in Neptune, to which you can log various types of metadata.

We'll keep the run active for the duration of the training, so we can monitor the metrics both during and after the training.

Add logging for metrics#

Many frameworks, like Keras, let you create a callback that is executed inside of the training loop.

If you have one, you can also use your own training loop.

In this example, we'll create a simple Neptune callback and pass it to the model.fit() method:

class NeptuneMonitor(keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs=None):
        for metric_name, metric_value in logs.items():
            run[f"train/{metric_name}"].append(metric_value)

model.fit(
    x_train,
    y_train,
    epochs=params["epoch_nr"],
    batch_size=params["batch_size"],
    callbacks=[NeptuneMonitor()],
)

Note

If you're interested in using Neptune with Keras, you don't need to implement the callback yourself. See the Keras integration guide for a full tutorial.

To log a series of values – like loss or other metrics – you use the append() method. Each append() call adds a value to the series, so it makes sense to use inside a loop.

for i in range(epochs):
    ...
    run["train/loss"].append(loss)
    run["train/acc"].append(accuracy)

Execute the script to start the training:

python train.py
If Neptune can't find your project name or API token

As a best practice, you should save your Neptune API token and project name as environment variables:

export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8"
export NEPTUNE_PROJECT="ml-team/classification"

Alternatively, you can pass the information when using a function that takes api_token and project as arguments:

run = neptune.init_run(
    api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8", # (1)!
    project="ml-team/classification", # (2)!
)
  1. In the bottom-left corner, expand the user menu and select Get my API token.
  2. You can copy the path from the project details ( Details & privacy).

If you haven't registered, you can log anonymously to a public project:

api_token=neptune.ANONYMOUS_API_TOKEN
project="common/quickstarts"

Make sure not to publish sensitive data through your code!

Click the run link that appears in the console output, or open your project in the Neptune app.

Sample output

[neptune] [info ] Neptune initialized. Open in the app: https://app.neptune.ai/workspace/project/e/RUN-1

Stop the run when done

Once you are done logging, you should stop the connection to the Neptune run. When logging from a Jupyter notebook or other interactive environments, you need to do this manually:

run.stop()

If you're running a script, the connection is stopped automatically when the script finishes executing. In interactive sessions, however, the connection to Neptune is only stopped when the kernel stops.

Monitor the results in Neptune#

  • Select Charts to view the training metrics live.
  • Select Monitoring to view system metrics, like hardware consumption and console logs (stderr and stdout).

See results in Neptune  Code examples