Skip to content

How to query metadata via the Neptune API#

You can download the metadata of your runs and models in bulk, or extract data from individual objects.

This page outlines some common ways to access typical metadata. For the full list of ways to interact with individual metadata fields, see Field types reference.

Downloading runs table as DataFrame#

To download the runs table of a project as a pandas DataFrame, initialize the project with init_project() and use the fetch_runs_table() method:

import neptune

project = neptune.init_project(
    project="workspace-name/project-name", # (1)!
    mode="read-only",
)

runs_table_df = project.fetch_runs_table().to_pandas()
  1. The full project name. For example, "ml-team/classification".

    To find the required string in the Neptune app, click How to create a new run. You can copy the project argument from the modal that opens.

The returned runs_table_df is a pandas DataFrame, where

  • each row represents a run.
  • each column represents a metadata field, such as metrics, text logs, and parameters.

Filtering runs#

You can pass a raw NQL string to the query argument.

import neptune

project = neptune.init_project()
project.fetch_runs_table(
    query='(`sys/tags`:stringSet CONTAINS "some-tag") AND (`f1`:float >= 0.85)',
)

This way, the runs can be filtered by almost any field and condition.

Learn more

For the full guide, see Neptune Query Language (NQL).

To simply filter by owner, ID, state, or tags, you can use any of the following dedicated parameters. Runs satisfying all criteria are fetched.

runs_table_df = project.fetch_runs_table(
    id=["NLU-1", "NLU-2"],  # one or more specific run IDs
    owner="jackie",  # name of the user or service account that created the runs
    state="inactive",  # "active" or "inactive"
    tag=["exploration", "optuna"],  # runs with particular tags
).to_pandas()

Note: These can't be combined with the query parameter.

Choosing columns to return#

To include only certain fields as columns in the table, use the columns parameter.

Set the sorting column with the sort_by parameter.

>>> runs_df = project.fetch_runs_table(
...     columns=["sys/running_time", "f1_score"],
...     sort_by="f1_score",
... ).to_pandas()
Fetching table...
>>> print(runs_df)
    sys/id  sys/running_time  f1_score
0    CLS-8             5.436      0.95
1    CLS-7            12.342      0.92
2    CLS-6           318.538      0.80
3    CLS-5             9.560      0.80
...

Limiting number of returned runs#

Use the limit parameter to restrict the number of returned entries.

project.fetch_runs_table(limit=100)

Using custom progress bar#

To use a custom callback to visualize the download progress, simply define a class that inherits from ProgressBarCallback and pass it to the progress_bar argument (the type, not an instance).

Example callback definition, using click
from types import TracebackType
from typing import Any, Optional, Type

from neptune.typing import ProgressBarCallback


class ClickProgressBar(ProgressBarCallback):
    def __init__(self, *, description: Optional[str] = None, **_: Any) -> None:
        super().__init__()
        from click import progressbar

        ...
        self._progress_bar = progressbar(iterable=None, length=1, label=description)
        ...

    def update(self, *, by: int, total: Optional[int] = None) -> None:
        if total:
            self._progress_bar.length = total
        self._progress_bar.update(by)
        ...

    def __enter__(self) -> "ClickProgressBar":
        self._progress_bar.__enter__()
        return self
        ...

    def __exit__(
        self,
        exc_type: Optional[Type[BaseException]],
        exc_val: Optional[BaseException],
        exc_tb: Optional[TracebackType],
    ) -> None:
        self._progress_bar.__exit__(exc_type, exc_val, exc_tb)
Using the custom callback
import neptune

project = neptune.init_project(project="ml-team/nlu", mode="read-only")
project.fetch_runs_table(progress_bar=ClickProgressBar)

Example#

Below is a more complex example, with

  1. NQL search query to filter the returned runs
  2. custom columns and sorting for the resulting table
  3. limited number of returned entries

The sample code would fetch only runs with an average accuracy of more than 0.7, and construct a table with the 5 most recently pinged runs. (Ping time refers to the last interaction with the Neptune client library; that is, when something was last logged by the API.)

>>> runs_table_df = project.fetch_runs_table(
...     query="average(accuracy:floatSeries) > 0.7",
...     columns=["sys/modification_time", "sys/ping_time"],
...     sort_by="sys/ping_time",
...     limit=5,
... ).to_pandas()
Fetching table...
>>> print(runs_table_df)
   sys/id     sys/modification_time             sys/ping_time
0  NLU-22  2024-01-12T16:52:14.083Z  2024-01-12T16:52:14.083Z
1  NLU-23  2024-01-03T08:49:06.008Z  2024-01-03T08:49:06.008Z
2  NLU-21  2023-12-22T10:34:09.065Z  2023-12-22T10:34:09.065Z
3  NLU-20  2023-12-22T10:28:56.065Z  2023-12-20T14:59:00.122Z
4  NLU-19   2023-12-20T14:56:54.21Z   2023-12-20T14:56:54.21Z

Related

Querying metadata from particular run#

To query metadata from a Run, Model, or Project object, use a fetching method corresponding to the logging method and data type.

Logging method Querying method Field type to use on
= or assign() fetch() Any single value field except (File)
add() fetch() Tags (StringSet)
append() (prev. log()) fetch_last(), fetch_values() FloatSeries, StringSeries
append(File()) download_last(), download() FileSeries
track_files() fetch_hash(), fetch_files_list(), download() Artifact
upload(), upload_files() download() File, FileSet

For the full list, see API referenceField types.

Initializing existing object#

Start by initializing Neptune with the ID of the object:

import neptune

run = neptune.init_run(with_id="CLS-8", mode="read-only")
How do I find the ID?

The Neptune ID is a unique identifier for the run. In the table view, it's displayed in the leftmost column.

The ID is stored in the system namespace (sys/id).

If the run is active, you can obtain its ID with run["sys/id"].fetch(). For example:

>>> run = neptune.init_run(project="ml-team/classification")
>>> run["sys/id"].fetch()
'CLS-26'

Setting the mode to "read only" ensures that no data is added or changed.

Querying single value#

Use the fetch() method to query any single-valued metadata of the object:

params = run["parameters"].fetch()
batch_size = run["parameters/batch_size"].fetch()
f1_score = run["f1_score"].fetch()

The same method works for fields in the system namespace:

username = run["sys/owner"].fetch()  # string
last_updated = run["sys/modification_time"].fetch()  # datetime
my_run_id = run["sys/id"].fetch()  # run ID

Fetching artifact metadata#

See Download artifact metadata.

Querying tags#

You can access the tags of an object by using fetch() on the sys/tags field:

run_tags = run["sys/tags"].fetch()

if "exploration" in run_tags:
    print_analysis()

Querying values from series field#

For value series logged with the append() (previously log()) method, you can query either the last value or the full list of values.

# Accessing last value of FloatSeries
final_loss = run["train/loss"].fetch_last()

# Accessing last value of StringSeries
last_stderr_line = run["monitoring/6519428b/stderr"].fetch_last()

Retrieve all the values and their indexes as a pandas DataFrame with fetch_values():

loss_df = run["train/loss"].fetch_values()

# Don't include timestamp
loss_df = run["train/loss"].fetch_values(include_timestamp=False)

Similarly, from FileSeries fields, you can download the last file to the disk with download_last() or download all files with download():

# Download last file in the FileSeries
run["train/epoch/histogram"].download_last()

# Download all files in the FileSeries
run["train/epoch/histogram"].download()

Downloading files#

Use download() to query uploaded files:

# Download example_image to the current directory
run["data/example_image"].download()

# Download model to the specified directory
run["trained_model"].download(destination_path)

You can also download files from artifact fields:

# Download all files from the artifact, optionally specifying a download destination
run["datasets/train"].download(destination="./datasets")

For details, see Download artifact metadata.

Querying nested namespaces or fields#

To query namespaces or fields that are nested under another namespace, you can access the object structure with get_structure().

Example

To get the monitoring sub-namespaces, which are generated automatically by Neptune for each process:

run.get_structure()["monitoring"].keys()