Reproducing a run#

Introduction#

When building ML models for research or production, it's important to be able to reproduce a run to validate its results and performance. With Neptune, you can reproduce any run by retrieving its metadata, such as hyperparameters and datasets, and logging it into a newly created run.

In this guide, we'll show you how to:

Reopen an old run in order to fetch the metadata needed to reproduce it.
Use the fetched metadata to parametrize a new run with the same training loop.

To demonstrate this process, we've prepared an example run with some metadata already logged.

The metadata structure of the example run is as follows:

All metadata

config
|-- dataset
    |-- ...
|-- params
    |-- ...
training
|-- ...

See example in Neptune See full code example on GitHub

Before you start#

Assumptions

You have Neptune installed and your Neptune credentials are saved as environment variables.

For details, see Install Neptune.
You have an existing run in a Neptune project that you have access to.

Fetching metadata from an existing run#

Get the ID of an old run#

To query the metadata we need from a run to be reproduced, we need to know its Neptune ID. The Neptune ID is a unique identifier of an object, for example CLS-26.

You can grab the ID manually from the web app. In the experiments table, it's displayed in the leftmost column.

You can also obtain the ID programmatically from the system namespace using the following code:

Fetch inactive runs as table

with neptune.init_project(mode="read-only") as project:
    runs_table_df = project.fetch_runs_table(state="inactive").to_pandas()

Extract the ID of the last successful run

old_run_id = runs_table_df[runs_table_df["sys/failed"] == False]["sys/id"].values[0]

print(f"{old_run_id}")

Resume the old run#

Using the ID obtained in the previous step, we can reopen the existing run.

old_run = neptune.init_run(
    with_id=old_run_id,
    mode="read-only",
)

Read-only mode

We're not logging new data, so we can resume the run in read-only mode.

You can do this whenever you're initializing an existing Neptune object that you only want to query metadata from.

For details, see Connection modes: Read-only mode

Fetch the metadata from the old run#

To rerun the training, we first need to get the required hyperparameters and dataset files. That way you'll be able to instantiate a model and dataset object with the same configuration. Use the fetch() method to retrieve the relevant data.

Fetch hyperparameters

old_run_params = old_run["config/params"].fetch()

Fetch dataset path

dataset_path = old_run["config/dataset/path"].fetch()

Creating a new run and logging the metadata to it#

We can now create a run that will use the metadata fetched in the previous step.

new_run = neptune.init_run(
    tags=["reproduce", "new-run"],
)

Adding the "reproduce" and "new-run" tags is optional, but will make it easier to find the run in the future.

Log hyperparameters and dataset details to the new run#

Now that a new run exists, we can start logging the metadata from the old run to it.

new_run["config/params"] = old_run_params
new_run["config/dataset/path"] = dataset_path

Stop logging#

When you're finished logging the metadata to the new run, remember to stop tracking both runs using the stop() method.

old_run.stop()
new_run.stop()

You can also see the full code examples on our GitHub examples repo.