Skip to content

Project#

Representation of a Neptune project.

If you want to create a new Neptune project, see management.create_project().

Initialization#

Initialize with the init_project() function or the class constructor.

import neptune

project = neptune.init_project()
from neptune import Project

project = Project()
If Neptune can't find your project name or API token

As a best practice, you should save your Neptune API token and project name as environment variables:

export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8"
export NEPTUNE_PROJECT="ml-team/classification"

Alternatively, you can pass the information when using a function that takes api_token and project as arguments:

run = neptune.init_run(
    api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8", # (1)!
    project="ml-team/classification", # (2)!
)
  1. In the bottom-left corner, expand the user menu and select Get my API token.
  2. You can copy the path from the project details ( Details & privacy).

If you haven't registered, you can log anonymously to a public project:

api_token=neptune.ANONYMOUS_API_TOKEN
project="common/quickstarts"

Make sure not to publish sensitive data through your code!

You can use the project object to:

  • Retrieve information about runs within the project.
  • Store and retrieve metadata on a project level, such as information about datasets, links to documentation, and key project metrics.

This metadata is displayed in the Project metadata section of the Neptune app.

Parameters

Name      Type Default     Description
project str, optional None Name of a project in the form workspace-name/project-name. If None, the value of the NEPTUNE_PROJECT environment variable is used.
api_token str, optional None Your Neptune API token (or a service account's API token). If None, the value of the NEPTUNE_API_TOKEN environment variable is used.

⚠ To keep your token secure, avoid placing it in source code. Instead, save it as an environment variable.

mode str, optional async Connection mode in which the logging will work. Possible values are async, sync, read-only, and debug.

If you leave it out, the value of the NEPTUNE_MODE environment variable is used. If that's not set, the default async is used.

flush_period float, optional 5 (seconds) In asynchronous (default) connection mode, how often Neptune should trigger disk flushing.
proxies dict, optional None Argument passed to HTTP calls made via the Requests library. For details on proxies, see the Requests documentation.
async_lag_callback NeptuneObjectCallback, optional None Custom callback function which is called if the lag between a queued operation and its synchronization with the server exceeds the duration defined by async_lag_threshold. The callback should take a Project object as the argument and can contain any custom code, such as calling stop() on the object.

Note: Instead of using this argument, you can use Neptune's default callback by setting the NEPTUNE_ENABLE_DEFAULT_ASYNC_LAG_CALLBACK environment variable to TRUE.

async_lag_threshold float, optional 1800.0 (seconds) Duration between the queueing and synchronization of an operation. If a lag callback (default callback enabled via environment variable or custom callback passed to the async_lag_callback argument) is enabled, the callback is called when this duration is exceeded.
async_no_progress_callback NeptuneObjectCallback, optional None Custom callback function which is called if there has been no synchronization progress whatsoever for the duration defined by async_no_progress_threshold. The callback should take a Project object as the argument and can contain any custom code, such as calling stop() on the object.

Note: Instead of using this argument, you can use Neptune's default callback by setting the NEPTUNE_ENABLE_DEFAULT_ASYNC_NO_PROGRESS_CALLBACK environment variable to TRUE.

async_no_progress_threshold float, optional 300.0 (seconds) For how long there has been no synchronization progress. If a no-progress callback (default callback enabled via environment variable or custom callback passed to the async_no_progress_callback argument) is enabled, the callback is called when this duration is exceeded.

Returns

Project object that can be used to interact with the project as a whole, like logging or fetching project-level metadata.

Example

Connect to a project in read-only mode and specify the project name
from neptune import Project

project = Project(
    project="ml-team/classification",
    mode="read-only",
)

Field lookup: []#

You can access the field of a project object through a dict-like field lookup: project[field_path].

This way, you can

  • store metadata (displayed in the Project metadata section):

    project["general/brief"] = URL_TO_PROJECT_BRIEF
    project["general/data_analysis"].upload("data_analysis.ipynb")
    
    project["dataset/v0.1"].track_files("s3://datasets/images")
    project.wait()
    project["dataset/latest"] = project["dataset/v0.1"].fetch()
    
  • fetch already logged metadata – for example, to have the single source of truth when starting a new run:

    run = neptune.init_run()
    run["dataset"] = project["dataset/latest"].fetch()
    project["dataset/latest"].fetch()
    

Returns

The returned type depends on the field type and whether a field is stored under the given path.

Path Example Returns
Field exists - The returned type matches the type of the field
Field does not exist - Handler object
Path is namespace and has field

Path: "train"

Field "train/acc" exists

Namespace handler object

Example

Note on collaboration

The project object follows the same logic as other Neptune objects: If you assign a new value to an existing field, the new value overwrites the previous one.

In a given project, you always initialize and work with the same project object, so take care not to accidentally overwrite each other's entries if your team is collaborating on project metadata.

Tip: Recall that the append() method appends the logged value to a series. It works for text strings as well as numerical values.

import neptune

project = neptune.init_project(project="ml-team/classification")
Create a new string field
project["general"] = "Project deadline: 1849-06-30"

# Overwrite the value of the existing string field
project["general"] = "Project deadline: 2049-06-30"
Create a new StringSeries field
project["train/logs"].append("ML model building, day 1:")

# Continue logging to existing series field
project["train/logs"].append("A model-building project is born")

If you access a namespace handler, you can interact with it like any other Neptune object.

info_namespace = project["info"]
info_namespace["deadline"] = "2049-06-30" # (1)!
  1. Stores "2049-06-30" under the field info/deadline.

Assignment: =#

Convenience alias for assign().


assign()#

Assign values to multiple fields from a dictionary. You can use this method to store multiple pieces of metadata with a single command.

Parameters

Name Type Default Description
value dict None A dictionary with values to assign, where keys (str) become the paths of the fields. The dictionary can be nested, in which case the path will be a combination of all keys.
wait Boolean, optional False By default, logging calls and other Neptune operations are periodically synchronized with the server in the background. If True, Neptune first waits to complete any queued operations, then executes the call and continues script execution. See Connection modes.

Example

import neptune

project = neptune.init_project(project="ml-team/classification")
Assign multiple fields from a dictionary
general_info = {"brief": URL_TO_PROJECT_BRIEF, "deadline": "2049-06-30"}
project["general"] = general_info

You can also explicitly log parameters one by one:

project["general/brief"] = URL_TO_PROJECT_BRIEF
project["general/deadline"] = "2049-06-30"

Dictionaries can be nested:

general_info = {"brief": {"url": URL_TO_PROJECT_BRIEF}}
project["general"] = general_info

The above logs the URL under the path general/brief/url.


del#

Completely removes the field or namespace and all associated metadata stored under the path.

See also: pop().

Examples

Assuming there is a field datasets/v0.4, the following will delete it:

import neptune

project = neptune.init_project(project="ml-team/classification")
del project["datasets/v0.4"]

You can also delete the whole namespace:

del project["datasets"]

exists()#

Checks if there is a field or namespace under the specified path.

Info

This method checks the local representation of the project. If the field was created by another process or the metadata has not reached the Neptune servers, it may not be possible to fetch. In this case you can:

  • Call sync() on the project object to synchronize the local representation with the server.
  • Call wait() on the project object to wait for all logging calls to finish.

Parameters

Name Type Default Description
path str - Path to check for the existence of a field or namespace

Examples

import neptune

project = neptune.init_project(project="ml-team/classification")

# If an old dataset exists, remove it
if project.exists("dataset/v0.4"):
    del project["dataset/v0.4"]

Info

When working in asynchronous (default) mode, the metadata you track may not be immediately available to fetch from the server, even if it appears in the local representation.

To work around this, you can call wait() on the project object.

import neptune

project = neptune.init_project(project="ml-team/classification")
project["general/brief"] = URL_TO_PROJECT_BRIEF

# The path exists in the local representation
if project.exists("general/brief"):
    # However, the tracking call may have not reached Neptune servers yet
    project["general/brief"].fetch()  # Error: the field does not exist

project.wait()

fetch()#

Fetches the values of all single-value fields (that are not of type File) as a dictionary.

The result preserves the hierarchical structure of the model metadata.

Returns

dict containing the values of all non-File single-value fields.

Example

Fetch all the project metrics (assuming they have been logged to a field metrics):

import neptune

project = neptune.init_project(project="ml-team/classification")
project_metrics = project["metrics"].fetch()

fetch_models_table()#

Deprecated

This function is deprecated.

For model management using runs, see Log model metadata.

Retrieve the models of the project.

Parameters

Name Type Default Description
query str, optional None NQL query string. Example: "model_size:float >= 100MB".
trashed bool, optional False Whether to retrieve trashed models. If True, only trashed models are retrieved. If False, only non-trashed models are retrieved. If None or left empty, all model objects are retrieved, including trashed ones.
columns list[str], optional None

Names of columns to include in the table, as a list of field names. The Neptune ID ("sys/id") is included automatically.

If None, all the columns of the models table are included (up to a maximum of 10 000).

For series fields, only the last appended value is returned.

limit int, optional None How many entries to return at most. If None, all entries are returned.
sort_by str, optional "sys/creation_time" Name of the field to sort the results by. The field must represent a simple type (string, float, datetime, integer, or Boolean).
ascending bool, optional False Whether to sort the entries in ascending order of the sorting column values.
progress_bar bool or Type[ProgressBarCallback], optional None Set to False to disable the download progress bar, or pass a type of ProgressBarCallback to use your own progress bar. If set to None or True, the default tqdm-based progress bar will be used.

Returns

An interim Table object containing Model objects.

Use to_pandas() to convert it to a pandas DataFrame.

Examples

Initialize project named ml-team/nlu
>>> import neptune
>>> project = neptune.init_project(project="ml-team/nlu", mode="read-only")
[neptune] [info   ] Neptune initialized...
Fetch metadata of all stored models as pandas DataFrame
>>> models_table_df = project.fetch_models_table().to_pandas()
Fetching table...
                 sys/creation_time      sys/id            sys/modification_time ...
0 2022-08-26 05:06:19.693000+00:00  NLU-FOREST 2022-08-26 05:06:20.944000+00:00 ...
1 2022-08-25 08:15:13.678000+00:00    NLU-TREE 2022-08-25 08:16:23.179000+00:00 ...
...
Include only "info/size_units" and "info/size_limit" fields as columns
>>> filtered_models_df = project.fetch_models_table(
...     columns=["info/size_units", "info/size_limit"]
... ).to_pandas()
Fetching table...
       sys/id   info/size_units   info/size_limit
0  NLU-FOREST                MB              50.0
1  NLU-TREE                  MB             100.0
...

Info

In the above example, the info/size_units and info/size_limit fields represent custom metadata fields. They would have been logged to the model manually, while the model object is active.

This would be one way to do it:

model = neptune.init_model(key="FOREST")
model["info"] = {"size_units": "MB", "size_limit": 50.0}
Sort model objects by size (space taken up in Neptune)
>>> models_table_df = project.fetch_models_table(sort_by="sys/size").to_pandas()
Fetching table...
          sys/creation_time       sys/id  ... sys/trashed  info/size_limit
0  2023-08-24T13:39:06.945Z   NLU-FOREST  ...       False             50.0
1  2023-10-19T10:51:20.828Z     NLU-TREE  ...       False            100.0
...
>>> # Extract the ID and size of the largest model object
... largest_model_id = models_table_df["sys/id"].values[0]
... largest_model_size = models_table_df["sys/size"].values[0]
... print(f"Model taking up the most space: {largest_model_id} ({largest_model_size} kb)")
Model taking up the most space: NLU-TREE (1378.0 kb)
Fetch 10 oldest model objects
>>> models_table_df = project.fetch_models_table(
...     sort_by="sys/creation_time",
...     ascending=True,
...     limit=10,
... )
Fetching table...
          sys/creation_time       sys/id  ... sys/trashed  info/size_limit
0  2023-08-24T13:39:06.945Z   NLU-FOREST  ...       False             50.0
1  2023-10-19T10:51:20.828Z     NLU-TREE  ...       False            100.0
...
9  2023-11-03T18:21:07.455Z     NLU-PRED  ...       False             50.0
>>> # Extract the ID of the oldest model object
... oldest_model_id = models_table_df["sys/id"].values[0]
>>> print("Oldest model object:", oldest_model_id)
Oldest model object: NLU-FOREST

To filter the models by a custom field and condition, pass an NQL string to the query argument:

Fetch large models with VGG backbone
models_df = project.fetch_models_table(
    query="(`model_size`:float > 100MB) AND (`backbone`:string = VGG)",
    columns=["sys/modification_time", "model_size"],
    sort_by="model_size",
).to_pandas()

For the syntax and examples, see the Neptune Query Language (NQL) reference.


fetch_runs_table()#

Retrieve runs matching the specified criteria.

All parameters are optional. Each of them specifies a single criterion. Only runs matching all of the criteria will be returned.

Parameters

Name Type Default Description
query str, optional None NQL query string. Example: "f1:float > 0.70".

Exclusive with the id, state, owner, and tag parameters.

id str or list[str], optional None Neptune ID of a run, or a list of multiple IDs. Example: "NLU-1" or ["NLU-1", "NLU-2"].

Matching any element of the list is sufficient to pass the criterion.

state str or list[str], optional None State or list of states. Possible values: "inactive", "active". "Active" means that at least one process is connected to the run.

Matching any element of the list is sufficient to pass the criterion.

For details, see sys ≫ State

owner str or list[str], optional None Username of the run owner, or list of multiple owners. Example: "josh" or ["frederic", "josh"]. The owner is the user who created the run.

Matching any element of the list is sufficient to pass the criterion.

tag str or list[str], optional None A tag or list of tags. Example: "lightGBM" or ["pytorch", "cycleLR"].

Note: Only runs that have all specified tags will pass this criterion.

columns list[str], optional None

Names of columns to include in the table, as a list of field names. The Neptune ID ("sys/id") is included automatically.

If None, all the columns of the experiments table are included (up to a maximum of 10 000).

For series fields, only the last appended value is returned.

trashed bool, optional False Whether to retrieve trashed runs. If True, only trashed runs are retrieved. If False, only non-trashed runs are retrieved. If None or left empty, all run objects are retrieved, including trashed ones.
limit int, optional None How many entries to return at most. If None, all entries are returned.
sort_by str, optional "sys/creation_time" Name of the field to sort the results by. The field must represent a simple type (string, float, datetime, integer, or Boolean).
ascending bool, optional False Whether to sort the entries in ascending order of the sorting column values.
progress_bar bool or Type[ProgressBarCallback], optional None Set to False to disable the download progress bar, or pass a type of ProgressBarCallback to use your own progress bar. If set to None or True, the default tqdm-based progress bar will be used.

Returns

An interim Table object containing run objects. Use to_pandas() to convert it to a pandas DataFrame.

Examples

Initialize project named ml-team/nlu
>>> import neptune
>>> project = neptune.init_project(project="ml-team/nlu", mode="read-only")
[neptune] [info   ] Neptune initialized. Open in the app: https://app.neptune.ai/ml-team/nlu
Fetch metadata of all runs as pandas DataFrame
>>> runs_table_df = project.fetch_runs_table().to_pandas()
Fetching table...: 100 [00:00, 222.56/s]
>>> print(runs_table_df)
                 sys/creation_time sys/description  sys/failed  ...
0 2022-08-26 07:28:42.673000+00:00                       False  ...
1 2022-08-26 07:18:41.321000+00:00                       False  ...
2 2022-08-26 07:07:20.338000+00:00                       False  ...
3 2022-08-26 05:36:39.615000+00:00                       False  ...
...
[22 rows x 262 columns]
Fetch 5 most recently updated runs and sort by ping time
>>> runs_table_df = project.fetch_runs_table(
...     columns=["sys/modification_time", "sys/ping_time"],
...     sort_by="sys/ping_time",
...     limit=5,
... ).to_pandas()
Fetching table...:  23%|██████████████                 | 5/22 [00:00<00:01, 14.66/s]
>>> print(runs_table_df)
   sys/id     sys/modification_time             sys/ping_time
0  NLU-22  2024-01-12T16:52:14.083Z  2024-01-12T16:52:14.083Z
1  NLU-23  2024-01-03T08:49:06.008Z  2024-01-03T08:49:06.008Z
2  NLU-21  2023-12-22T10:34:09.065Z  2023-12-22T10:34:09.065Z
3  NLU-20  2023-12-22T10:28:56.065Z  2023-12-20T14:59:00.122Z
4  NLU-19   2023-12-20T14:56:54.21Z   2023-12-20T14:56:54.21Z
Extract most recently updated run and when the logging occurred
>>> last_updated_run_id = runs_table_df["sys/id"].values[0]
>>> last_updated_run_time = runs_table_df["sys/ping_time"].values[0]
>>> print(f"Most recently updated: {last_updated_run_id} (last entry logged at {last_updated_run_time})")
Most recently updated: NLU-22 (last entry logged at 2024-01-12T16:52:14.083Z)

To filter the runs by a custom field and condition, pass an NQL string to the query argument:

Fetch runs with names containing "blobfish" and f1 score of at least 0.70
runs_table_df = project.fetch_runs_table(
    query='(`scores/f1`:float >= 0.7) AND (`sys/name`:string CONTAINS "blobfish")',
    columns=["sys/modification_time", "sys/name", "scores/f1"],
    sort_by="scores/f1",
).to_pandas()

For the syntax and examples, see the Neptune Query Language (NQL) reference.

You can also filter the runs by ID, state, owner, or tag using the dedicated parameters. Note that these can't be combined with the query parameter.

Fetch specific runs by ID
runs_table_df = project.fetch_runs_table(id=["NLU-1", "NLU-2"]).to_pandas()
Fetch only inactive runs
runs_table_df = project.fetch_runs_table(state="inactive").to_pandas()
Fetch only runs created by CI service account
runs_table_df = project.fetch_runs_table(owner="my_ci_service@ml-team").to_pandas()
Fetch only runs that have both the tags Exploration and Optuna
runs_table_df = project.fetch_runs_table(tag=["Exploration", "Optuna"]).to_pandas()
Combine conditions (runs satisfying all criteria are fetched)
runs_table_df = project.fetch_runs_table(
    state="inactive", tag="Exploration"
).to_pandas()

To use a custom progress bar callback, define a class that inherits from ProgressBarCallback and pass it to the progress_bar argument (the type, not an instance).

Show example code
Example callback definition, using click
from types import TracebackType
from typing import Any, Optional, Type

from neptune.typing import ProgressBarCallback


class ClickProgressBar(ProgressBarCallback):
    def __init__(self, *, description: Optional[str] = None, **_: Any) -> None:
        super().__init__()
        from click import progressbar

        ...
        self._progress_bar = progressbar(iterable=None, length=1, label=description)
        ...

    def update(self, *, by: int, total: Optional[int] = None) -> None:
        if total:
            self._progress_bar.length = total
        self._progress_bar.update(by)
        ...

    def __enter__(self) -> "ClickProgressBar":
        self._progress_bar.__enter__()
        return self
        ...

    def __exit__(
        self,
        exc_type: Optional[Type[BaseException]],
        exc_val: Optional[BaseException],
        exc_tb: Optional[TracebackType],
    ) -> None:
        self._progress_bar.__exit__(exc_type, exc_val, exc_tb)
Using the custom callback
import neptune

project = neptune.init_project(project="ml-team/nlu", mode="read-only")
project.fetch_runs_table(progress_bar=ClickProgressBar)

get_structure()#

Returns the metadata structure of a project object in the form of a dictionary.

This method can be used to traverse the metadata structure programmatically when using Neptune in automated workflows.

See also: print_structure().

Warning

The returned object is a shallow copy of the internal structure. Any modifications to it may result in tracking malfunction.

Returns

dict with the project metadata structure.

Example

>>> import neptune
>>> project = neptune.init_project(project="ml-team/classification")
>>> project.get_structure()
{'general': {'brief': <neptune.attributes.atoms.string.String object at 0x000001C8EF7A5BD0>, 'deadline': <neptune.attributes.atoms.string.String object at 0x000001C8EF7A66B0>, ... }}

get_url()#

Returns a direct link to the project in Neptune. The same link is printed in the console once the project object has been initialized.

Returns

str with the URL of the project in Neptune.

Example

>>> import neptune
>>> project = neptune.init_project(project="ml-team/classification")
>>> project.get_url()
https://app.neptune.ai/ml-team/classification/

pop()#

Completely removes the field or namespace and all associated metadata stored under the path.

See also del.

Parameters

Name Type Default Description
path str - Path of the field or namespace to be removed.
wait Boolean, optional False By default, logging calls and other Neptune operations are periodically synchronized with the server in the background. If True, Neptune first waits to complete any queued operations, then executes the call and continues script execution. See Connection modes.

Examples

Assuming there is a field datasets/v0.4, the following will delete it:

import neptune

project = neptune.init_project(project="ml-team/classification")
project.pop("datasets/v0.4")

You can invoke pop() directly on both fields and namespaces.

# The following line
project.pop("datasets/v0.4")
# is equiavlent to this line
project["datasets/v0.4"].pop()
# and this line
project["datasets"].pop("v0.4")

You can also batch-delete the whole namespace:

project["datasets"].pop()

Pretty-prints the structure of the project metadata. Paths are ordered lexicographically and the structure is colored.

See also: get_structure().

Example

>>> import neptune
>>> project = neptune.init_project(project="ml-team/classification")
>>> project.print_structure()
'general':
    'brief': String
    'deadline': String
'sys':
    'creation_time': Datetime
    'id': String
    'modification_time': Datetime
    'monitoring_time': Integer
    'name': String
    'ping_time': Datetime
    'running_time': Float
    'size': Float
    'state': RunState
    'visibility': String

stop()#

Stops the connection to Neptune and synchronizes all data.

When using context managers, Neptune automatically calls stop() when exiting the project context.

Warning

Always call stop() in interactive environments, such as a Python interpreter or Jupyter notebook. The connection to Neptune is not stopped when the cell has finished executing, but rather when the entire notebook stops.

If you're running a script, the connection is stopped automatically when the script finishes executing. However, it's a best practice to call stop() when the connection is no longer needed.

Parameters

Name Type Default Description
seconds int or float, optional None Wait for the specified time for all logging calls to finish before stopping the connection. If None, wait for all logging calls to finish.

Examples

If you're initializing the connection from a Python script, Neptune stops it automatically when the script finishes execution.

import neptune

project = neptune.init_project(project="ml-team/classification")

[...] # Your code

# stop() is automatically called at the end for every Neptune object

Using with statement and context manager:

for project_identifier in projects:
    with neptune.init_project(project=project_identifier) as project:
        [...] # Your code
        # stop() is automatically called
        # when code execution exits the with statement

sync()#

Synchronizes the local representation of the project with Neptune servers.

Parameters

Name Type Default Description
wait Boolean, optional False By default, logging calls and other Neptune operations are periodically synchronized with the server in the background. If True, Neptune first waits to complete any queued operations, then executes the call and continues script execution. See Connection modes.

wait()#

Wait for all the logging calls to finish.

Parameters

Name Type Default Description
disk_only Boolean, optional False If True, the process will wait only for the data to be saved locally from memory, but will not wait for it to reach Neptune servers.

Table.to_pandas()#

The Table object is an interim object containing the metadata of fetched objects. To access the data, you need to convert it to a pandas DataFrame by invoking to_pandas().

Returns

Tabular data in the pandas.DataFrame format.

Example

Fetch project "jackie/named-entity-recognition":

import neptune

project = neptune.init_project(
    project="jackie/named-entity-recognition",
    mode="read-only",
)

Fetch all runs metadata as pandas DataFrame:

runs_table_df = project.fetch_runs_table().to_pandas()

Sort runs by creation time:

runs_table_df = runs_table_df.sort_values(by="sys/creation_time", ascending=False)

Extract the ID of the last run:

last_run_id = runs_table_df["sys/id"].values[0]

You can also filter the experiments table by state, owner, or tag.

Fetch only inactive runs
runs_table_df = project.fetch_runs_table(state="inactive").to_pandas()
Fetch only runs created by CI service
runs_table_df = project.fetch_runs_table(owner="my_company_ci_service").to_pandas()
Fetch only runs that have both Exploration and Optuna tags
runs_table_df = project.fetch_runs_table(tag=["Exploration", "Optuna"]).to_pandas()
Combine conditions (runs satisfying all conditions will be fetched)
runs_table_df = project.fetch_runs_table(
    state="inactive", tag="Exploration"
    ).to_pandas()

Fetch model versions as table:

import neptune

model = neptune.init_model(with_id="NER-PRETRAINED")
model_versions_df = model.fetch_model_versions_table().to_pandas()