Correlate metrics and logs
When a metric idicates a problem, the logs might help you investigate the root cause of the issue. With Neptune, you can build custom views where you can see and analyze the correlation between the metrics and logs from the same time range or at the same step.
This tutorial explains how to:
- Capture logs with Neptune
- View logs in the Neptune app
- Correlate metrics to logs and hardware usage
Before you start
Configure your Neptune API token and project. For details, see Get started.
Log metrics
To log numerical series to Neptune, use the log_metrics()
function. For details, see Metrics.
Capture logs
You can log your own custom messages to Neptune as StringSeries
attributes. Each message is timestamped and associated with a step value which makes it easier to track progress during training. For example, you can log the following information:
- Error and warning messages
- Progress updates, configuration changes
- Custom debugging information
To send custom messages or logs to Neptune, log a series of strings with the log_string_series()
function.
from random import random
from neptune_scale import Run
def hello_neptune():
run = Run(
api_token="eyJhcGlfYWRkcmVz...In0=", # not needed if using environment variable
project="workspace-name/project-name", # not needed if using environment variable
experiment_name="tutorial-metrics-and-logs",
)
run.log_string_series(
data={"status": "Starting training"},
step=0,
)
num_steps = 20
offset = random() / 5
for step in range(1, num_steps):
# Your training loop
acc = 1 - 2**-step - random() / (step + 1) - offset
loss = 2**-step + random() / (step + 1) + offset
if step % (num_steps // 2) == 0: # Add a simulated error
run.log_string_series(
data={"status": f"Step = {step}, Loss = NaN"},
step=step,
)
elif step % 1 == 0:
run.log_string_series(
data={"status": f"Step = {step}, All metrics logged"},
step=step,
)
# Log metrics as usual
run.log_metrics(
data={"accuracy": acc, "loss": loss},
step=step,
)
run.log_string_series(
data={"status": "Training complete!"},
step=step,
)
run.close()
if __name__ == "__main__":
hello_neptune()
For details, see Log a series of strings.
Other options
In addition to logging custom messages, Neptune offers the following options to track runtime information:
- Neptune logs the standard streams
stderr
andstdout
automatically. - You can capture logs with the Python
Logger
. For details, see Log PythonLogger
output. - You can monitor hardware usage and log the results to Neptune with the
neptune_hardware_monitoring.py
utility script.
Build visualizations
To build custom visaulizations in the Neptune app:
-
For a detailed comparison between them, see Gather and share insights.
-
To visualize the logged metadata, use widgets:
- For metrics that you want to analyze, create chart widgets.
- For the custom messages, create a logs widgets.
-
Align the training charts to the logs.
-
To use relative time:
- For chart widgets, set the X-axis series to relative time.
- For logs widgets, set the time scale to relative time.
-
To use steps:
- For charts widgets, set the X-axis to step.
- For logs widgets, enable the Display steps option.
-
-
From the runs table, select a run whose metadata you want to see and compare.
Note that to view the contents of the logs widget, you must select only one run at a time. For details, see Select runs to compare.
Analyze the correlation
Once you configure your dashboard or report, you can analyze the correlation between the metrics and logs.
For example, you can debug spikes in training charts by correlating them to the errors messages logged per step or in the same time range. You can also analyze if these effects are shared across other metrics and determine if you should abandon or fork a training run.
To zoom in on a chart, click and drag over its area. The zoom applies to all charts that share the same X-axis series in your current view.