Skip to content

Querying metadata via the Neptune API#

With the Neptune API, you can query and fetch your logged metadata to the local machine.

You can download metadata logged in your project in bulk, or extract data from individual objects.

This page outlines some common ways to access typical metadata. For the full list of ways to interact with metadata fields, see Field types reference.

Downloading runs table as DataFrame#

To download the runs table of a project as a pandas DataFrame, initialize the project with init_project() and use the fetch_runs_table() method:

import neptune.new as neptune

project = neptune.init_project(
    name="workspace-name/project-name",  # (1)
    mode="read-only",
)

# Get dashboard with runs contributed by "jackie" and tagged "cycleLR"
runs_table_df = project.fetch_runs_table(owner="sophia", tag="cycleLR").to_pandas()

runs_table_df.head()
  1. The full project name. For example, "ml-team/classification". To copy it, navigate to the project settingsProperties.

Info

The same applies to models stored in the model registry. Fetch them with fetch_models_table()

The returned runs_table_df is a pandas DataFrame, where

  • each row represents a run.
  • each column represents a metadata field, such as metrics, text logs, and parameters.

To include only certain fields as columns in the table, you can specify the names of namespaces or fields with the columns parameter:

# Fetch table of runs, including only the "f1_score" and "sys/running_time"
# fields as columns
>>> filtered_runs_table = project.fetch_runs_table(
...     columns=["f1_score", "sys/running_time"]
... )
>>> filtered_runs_df = filtered_runs_table.to_pandas()
>>> print(filtered_runs_df)
    sys/id  sys/running_time  f1_score
0    CLS-8             5.436      0.95
1    CLS-7            12.342      0.92
2    CLS-6           318.538      0.80
3    CLS-5             9.560      0.80
...

Related

Querying metadata from object#

To query metadata from a given run or other object, you fetch or download the metadata in a similar fashion as you logged it.

The methods correspond in the following way:

Logging method Querying method Field type
= or assign() fetch() Atom
add() fetch() Tags (StringSet)
log() fetch_last(), fetch_values() Series
log(File()) download_last(), download() FileSeries
track_files() fetch_hash(), fetch_files_list(), download() Artifact
upload(), upload_files() download() File, FileSet

For the full list, see API referenceField types.

Initializing existing object#

Start by initializing Neptune with the ID of the object:

import neptune.new as neptune

run = neptune.init_run(with_id="CLS-8")
How do I find the ID?

The Neptune ID is a unique identifier for the object. It's displayed in the Details view and in the leftmost column of the table views.

The ID is stored in the system namespace. You can obtain it with object["sys/id"].fetch(). For example:

>>> run = neptune.init_run(project="ml-team/classification")
>>> run["sys/id"].fetch()
'CLS-26'

Querying single value#

Use the fetch() method to query any single-valued metadata of the object:

params = run["parameters"].fetch()
batch_size = run["parameters/batch_size"].fetch()
f1_score = run["f1_score"].fetch()

The same method works for fields in the system namespace:

username = run["sys/owner"].fetch()  # string
last_updated = run["sys/modification_time"].fetch()  # datetime
my_run_id = run["sys/id"].fetch()  # run ID

Querying tags#

You can access the tags of an object by using fetch() on the sys/tags field:

run_tags = run["sys/tags"].fetch()

if "exploration" in run_tags:
    print_analysis()

Querying values from series field#

For value series logged with the log() method, you can query either the last value or the full list of values.

# Accessing last value of FloatSeries
final_loss = run["train/loss"].fetch_last()

# Accessing last value of StringSeries
last_stderr_line = run["monitoring/stderr"].fetch_last()

Retrieve all the values and their indexes as a pandas DataFrame with fetch_values():

loss_df = run["train/loss"].fetch_values()

# Don't include timestamp
loss_df = run["train/loss"].fetch_values(include_timestamp=False)

Similarly, from FileSeries fields, you can download the last file to the disk with download_last() or download all files with download():

# Download last file in the FileSeries
run["train/epoch/histogram"].download_last()

# Download all files in the FileSeries
run["train/epoch/histogram"].download()

Downloading files#

Use download() to query uploaded files:

# Download example_image to the current directory
run["data/example_image"].download()

# Download model to the specified directory
run["trained_model"].download(destination_path)

You can also download files from artifact fields:

# Download all files from the artifact, optionally specifying a download destination
run["datasets/train"].download(destination="./datasets")

For details, see Tracking artifacts: Querying artifact metadata.