Monitoring model training live#
Need a more detailed walkthrough that starts from installation? The Neptune tutorial has you covered.
This example walks you through basic monitoring of your model-training process:
- Looking at learning curves for loss and accuracy
- Monitoring hardware consumption during training across CPU, GPU, and memory
Before you start#
- Set up Neptune. Instructions:
- Have TensorFlow 2.X with Keras installed.
What if I don't use Keras?
No worries, we're just using it for demonstration purposes. You can use any framework you like, and Neptune has intregrations with various popular frameworks. For details, see the Integrations tab.
Create a basic training script#
Create a file train.py
and copy the script below.
from tensorflow import keras
params = {
"epoch_nr": 100,
"batch_size": 256,
"lr": 0.005,
"momentum": 0.4,
"use_nesterov": True,
"unit_nr": 256,
"dropout": 0.05,
}
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = keras.models.Sequential(
[
keras.layers.Flatten(),
keras.layers.Dense(
params["unit_nr"],
activation=keras.activations.relu,
),
keras.layers.Dropout(params["dropout"]),
keras.layers.Dense(10, activation=keras.activations.softmax),
]
)
optimizer = keras.optimizers.SGD(
learning_rate=params["lr"],
momentum=params["momentum"],
nesterov=params["use_nesterov"],
)
model.compile(
optimizer=optimizer,
loss="sparse_categorical_crossentropy",
metrics=["accuracy"],
)
model.fit(
x_train,
y_train,
epochs=params["epoch_nr"],
batch_size=params["batch_size"],
)
In your terminal program, run the script to ensure that it works properly.
Connect Neptune to your code#
At the top of your script, add the following:
- We recommend saving your API token and project name as environment variables. If needed, you can pass them as arguments when initializing Neptune:
neptune.init_run(project="workspace-name/project-name", api_token="Your Neptune API token here")
Haven't registered yet?
No problem. You can try Neptune anonymously by logging to a public project with a shared API token:
This creates a new run in Neptune, to which you can log various types of metadata.
We'll keep the run active for the duration of the training, so we can monitor the metrics both during and after the training.
Add logging for metrics#
Many frameworks, like Keras, let you create a callback that is executed inside of the training loop.
If you have one, you can also use your own training loop.
In this example, we'll create a simple Neptune callback and pass it to the model.fit()
method:
class NeptuneMonitor(keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
for metric_name, metric_value in logs.items():
run[f"train/{metric_name}"].append(metric_value)
model.fit(
x_train,
y_train,
epochs=params["epoch_nr"],
batch_size=params["batch_size"],
callbacks=[NeptuneMonitor()],
)
Note
If you're interested in using Neptune with Keras, you don't need to implement the callback yourself. See Workinging with Keras for the full integration guide.
Execute the script to start the training:
If Neptune can't find your project name or API token
As a best practice, you should save your Neptune API token and project name as environment variables:
export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh3Kb8"
export NEPTUNE_PROJECT="ml-team/classification"
You can, however, also pass them as arguments when initializing Neptune:
run = neptune.init_run(
api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh3Kb8", # your token here
project="ml-team/classification", # your full project name here
)
- Find and copy your API token by clicking your avatar and selecting Get my API token.
- Find and copy your project name in the project Settings → Properties.
If you haven't registered, you can also log anonymously to a public project (make sure not to publish sensitive data through your code!):
To access a run directly, you can click the run link that appears in the console output.
If you already have the project open in the app, you can click on a run in the runs table to open it.
Stop the run when done
Once you are done logging, you should stop the Neptune run. You need to do this manually when logging from a Jupyter notebook or other interactive environment:
If you're running a script, the connection is stopped automatically when the script finishes executing. In notebooks, however, the connection to Neptune is not stopped when the cell has finished executing, but rather when the entire notebook stops.
Monitor the results in Neptune#
In the run view:
- Select Charts to view the training metrics live.
- Select Monitoring to view system metrics, like hardware consumption and console logs (stderr and stdout).
See results in Neptune  Code examples