Skip to content

Project#

Representation of a Neptune project.

Initialize with the init_project() function:

Initialization
import neptune

# Connect to the project "classification" in the workspace "ml-team"
project = neptune.init_project(project="ml-team/classification")

You can use the project object to:

  • Retrieve information about runs, models, and model versions within the project.
  • Store and retrieve metadata on a project level, such as information about datasets, links to documentation, and key project metrics.
Note on collaboration

The project object follows the same logic as other Neptune objects: If you assign a new value to an existing field, the new value overwrites the previous one.

In a given project, you always initialize and work with the same project object, so take care not to accidentally override each other's entries if your team is collaborating on project metadata.

Tip: Recall that the append() method appends the logged value to a series. It works for text strings as well as numerical values.

Field lookup: []#

You can access the field of a project object through a dict-like field lookup: project[field_path].

This way, you can

  • store metadata:

    project["general/brief"] = URL_TO_PROJECT_BRIEF
    project["general/data_analysis"].upload("data_analysis.ipynb")
    
    project["dataset/v0.1"].track_files("s3://datasets/images")
    project.wait()
    project["dataset/latest"] = project["dataset/v0.1"].fetch()
    
  • fetch already tracked metadata – for example, to have the single source of truth when starting a new run:

    run = neptune.init_run()
    run["dataset"] = project["dataset/latest"].fetch()
    project["dataset/latest"].fetch()
    

Returns

The returned type depends on the field type and whether a field is stored under the given path.

Path Example Returns
Field exists - The returned type matches the type of the field
Field does not exist - Handler object
Path is namespace and has field

Path: "train"

Field "train/acc" exists

Namespace handler object

Example

import neptune

project = neptune.init_project(project="ml-team/classification")

# Create new Dict field
project["general"] = "Project deadline: 1849-06-30."

# Update the value of the field
project["general"] = "Project deadline: 2049-06-30."

# Error - it's no longer possible to store a File under a String field
project["general"].upload("project_info.txt")  # Error

# Create new Series fields
project["train/logs"].append("ML model building, day 1:")

# Continue logging to existing Series fields
project["train/logs"].append("A model-building project is born")

# If you access a namespace handler, you can interact with it like an object
info_ns = project["info"]
info_ns["deadline"] = "2049-06-30"  # Stores "2049-06-30" under "info/deadline"

Assignment: =#

Convenience alias for assign().


assign()#

Assign values to multiple fields from a dictionary. You can use this method to store multiple pieces of metadata with a single command.

Parameters

Name Type Default Description
value dict None A dictionary with values to assign, where keys (str) become the paths of the fields. The dictionary can be nested, in which case the path will be a combination of all keys.
wait Boolean, optional False By default, tracked metadata is sent to the server in the background. With this option set to True, Neptune first sends all data before executing the call. See Connection modes.

Example

import neptune

project = neptune.init_project(project="ml-team/classification")

# Assign multiple fields from a dictionary
general_info = {"brief": URL_TO_PROJECT_BRIEF, "deadline": "2049-06-30"}
project["general"] = general_info

# You can always explicitly log parameters one by one
project["general/brief"] = URL_TO_PROJECT_BRIEF
project["general/deadline"] = "2049-06-30"

# Dictionaries can be nested
general_info = {"brief": {"url": URL_TO_PROJECT_BRIEF}}
project["general"] = general_info
# This will log the url under path "general/brief/url"

del#

Completely removes the field or namespace and all associated metadata stored under the path.

See also: pop().

Examples

import neptune

project = neptune.init_project(project="ml-team/classification")

# Delete the field with the path "datasets/v0.4"
del project["datasets/v0.4"]

# You can also delete the whole namespace
del project["datasets"]

exists()#

Checks if there is a field or namespace under the specified path.

Info

This method checks the local representation of the project. If the field was created by another process or the metadata has not reached the Neptune servers, it may not be possible to fetch. In this case you can:

  • Call sync() on the project object to synchronize the local representation with the server.
  • Call wait() on the project object to wait for all tracking calls to finish.

Parameters

Name Type Default Description
path str - Path to check for the existence of a field or namespace

Examples

import neptune

project = neptune.init_project(project="ml-team/classification")

# If an old dataset exists, remove it
if project.exists("dataset/v0.4"):
    del project["dataset/v0.4"]

Info

When working in asynchronous (default) mode, the metadata you track may not be immediately available to fetch from the server, even if it appears in the local representation.

To work around this, you can call wait() on the project object.

import neptune

project = neptune.init_project(project="ml-team/classification")

project["general/brief"] = URL_TO_PROJECT_BRIEF

# The path exists in the local representation
if project.exists("general/brief"):
    # However, the tracking call may have not reached Neptune servers yet
    project["general/brief"].fetch()  # Error: the field does not exist

project.wait()

fetch()#

Fetches the values of all single-value fields (that are not of type File) as a dictionary.

The result preserves the hierarchical structure of the model metadata.

Returns

dict containing the values of all non-File single-value fields.

Example

import neptune

project = neptune.init_project(project="ml-team/classification")

# Fetch all the project metrics
project_metrics = project["metrics"].fetch()

fetch_models_table()#

Retrieve the models of the project, up to a maximum of 10 000.

Parameters

Name Type Default Description
columns list[str], optional None

Names of columns to include in the table, as a list of namespace or field names.

The Neptune ID ("sys/id") is included automatically.

If None, all the columns of the models table are included.

Returns

An interim Table object containing Model objects.

Use to_pandas() to convert it to a pandas DataFrame.

Example

>>> import neptune

# Initialize project "ml-team/classification"
>>> project = neptune.init_project(project="ml-team/classification", mode="read-only")
https://app.neptune.ai/ml-team/classification/
Remember to stop your project...

# Fetch list of all stored models as pandas DataFrame
>>> models_table_df = project.fetch_models_table().to_pandas()
>>> print(models_table_df)
                 sys/creation_time      sys/id            sys/modification_time ...
0 2022-08-26 05:06:19.693000+00:00  CLS-FOREST 2022-08-26 05:06:20.944000+00:00 ...
1 2022-08-25 08:15:13.678000+00:00    CLS-TREE 2022-08-25 08:16:23.179000+00:00 ...

# Fetch list of all models, including only the "train/acc" and "val/acc"
# accuracy fields as columns
>>> filtered_models_table = project.fetch_models_table(
...     columns=["sys/size", "model/size_limit"]
... )
>>> filtered_models_df = filtered_models_table.to_pandas()
>>> print(filtered_models_df)
       sys/id  sys/size  model/size_limit
0  CLS-FOREST     415.0              50.0
1  CLS-TREE       387.0              50.0
...

# Sort model objects by size
>>> models_table_df = models_table_df.sort_values(by="sys/size")

# Sort models by creation time
>>> models_table_df = models_table_df.sort_values(
...     by="sys/creation_time",
...     ascending=False,
... )

# Extract the ID of the last model
>>> last_model_id = models_table_df["sys/id"].values[0]

fetch_runs_table()#

Retrieve runs matching the specified criteria, up to a maximum of 10 000.

All parameters are optional. Each of them specifies a single criterion. Only runs matching all of the criteria will be returned.

Parameters

Name Type Default Description
id str or list[str], optional None Identifier of run, or list* of identifiers of multiple runs. Example: "CLS-1" or ["CLS-1", "CLS-2"].
state str or list[str], optional None State of the run, or list* of states. Possible values: "inactive", "active".
owner str or list[str], optional None Username of the run owner, or list* of multiple owners. Example: "josh" or ["frederic", "josh"]. The user who created the run is an owner.
tag str or list[str], optional None An experiment tag or list of tags. Example: "lightGBM" or ["pytorch", "cycleLR"].

Note: Only runs that have all specified tags will pass this criterion.

columns list[str], optional None

Names of columns to include in the table, as a list of namespace or field names.

The Neptune ID ("sys/id") is included automatically.

If None, all the columns of the runs table are included.

* Matching any element of the list is sufficient to pass the criterion.

Returns

An interim Table object containing run objects. Use to_pandas() to convert it to a pandas DataFrame.

Example

>>> import neptune

# Initialize project "ml-team/classification"
>>> project = neptune.init_project(
...     project="ml-team/classification",
...     mode="read-only",
... )
https://app.neptune.ai/ml-team/classification/
Remember to stop your project...

# Fetch metadata of all runs as pandas DataFrame
>>> runs_table_df = project.fetch_runs_table().to_pandas()

>>> print(runs_table_df)
                 sys/creation_time sys/description  sys/failed  ...
0 2022-08-26 07:28:42.673000+00:00                       False  ...
1 2022-08-26 07:18:41.321000+00:00                       False  ...
2 2022-08-26 07:07:20.338000+00:00                       False  ...
3 2022-08-26 05:36:39.615000+00:00                       False  ...

# Fetch list of all runs, including only the "f1" and "sys/running_time"
# fields as columns
>>> filtered_runs_table = project.fetch_runs_table(
...     columns=["f1", "sys/running_time"]
... )
>>> filtered_runs_df = filtered_runs_table.to_pandas()
>>> print(filtered_runs_df)
    sys/id  sys/running_time    f1
0    CLS-8             5.436  0.95
1    CLS-7            12.342  0.92
2    CLS-6           318.538  0.80
3    CLS-5             9.560  0.80
...

# Sort runs by creation time
>>> runs_table_df = runs_table_df.sort_values(
...     by="sys/creation_time",
...     ascending=False,
... )

# Extract the ID of the last run
>>> last_run_id = runs_table_df["sys/id"].values[0]
# You can also filter the runs table by state, owner or tag or a combination

# Fetch only inactive runs
>>> runs_table_df = project.fetch_runs_table(state="inactive").to_pandas()

# Fetch only runs created by CI service account
>>> runs_table_df = project.fetch_runs_table(
...     owner="my_company_ci_service@ml-team").to_pandas()

# Fetch only runs that have both the tags "Exploration" and "Optuna"
>>> runs_table_df = project.fetch_runs_table(
...     tag=["Exploration", "Optuna"]).to_pandas()

# You can combine conditions (runs satisfying all conditions are fetched)
>>> runs_table_df = project.fetch_runs_table(
...     state="inactive", tag="Exploration").to_pandas()

get_structure()#

Returns the metadata structure of a project object in the form of a dictionary.

This method can be used to traverse the metadata structure programmatically when using Neptune in automated workflows.

See also: print_structure().

Warning

The returned object is a shallow copy of the internal structure. Any modifications to it may result in tracking malfunction.

Returns

dict with the project metadata structure.

Example

>>> import neptune
>>> project = neptune.init_project(project="ml-team/classification")
>>> project.get_structure()
{'general': {'brief': <neptune.attributes.atoms.string.String object at 0x000001C8EF7A5BD0>, 'deadline': <neptune.attributes.atoms.string.String object at 0x000001C8EF7A66B0>, ... }}

get_url()#

Returns a direct link to the project in Neptune. The same link is printed in the console once the project object has been initialized.

Returns

str with the URL of the project in Neptune.

Example

>>> import neptune
>>> project = neptune.init_project(project="ml-team/classification")
>>> project.get_url()
https://app.neptune.ai/ml-team/classification/

pop()#

Completely removes the field or namespace and all associated metadata stored under the path.

See also del.

Parameters

Name Type Default Description
path str - Path of the field or namespace to be removed.
wait Boolean, optional False By default, tracked metadata is sent to the server in the background. With this option set to True, Neptune first sends all data before executing the call. See Connection modes.

Examples

import neptune

project = neptune.init_project(project="ml-team/classification")

# Delete a field along with its data
project.pop("datasets/v0.4")

# You can invoke pop() directly on fields and namespaces

# The following line
project.pop("datasets/v0.4")
# is equiavlent to this line
project["datasets/v0.4"].pop()
# and this line
project["datasets"].pop("v0.4")

# You can also batch-delete the whole namespace
project["datasets"].pop()

Pretty-prints the structure of the project metadata. Paths are ordered lexicographically and the structure is colored.

See also: get_structure()

Example

>>> import neptune
>>> project = neptune.init_project(project="ml-team/classification")
>>> project.print_structure()
'general':
    'brief': String
    'deadline': String
'sys':
    'creation_time': Datetime
    'id': String
    'modification_time': Datetime
    'monitoring_time': Integer
    'name': String
    'ping_time': Datetime
    'running_time': Float
    'size': Float
    'state': RunState
    'visibility': String

stop()#

Stops the connection to Neptune and synchronizes all data.

When using context managers, Neptune automatically calls stop() when exiting the project context.

Warning

Always call stop() in interactive environments, such as a Python interpreter or Jupyter notebook. The connection to Neptune is not stopped when the cell has finished executing, but rather when the entire notebook stops.

If you're running a script, the connection is stopped automatically when the script finishes executing. However, it's a best practice to call stop() when the connection is no longer needed.

Parameters

Name Type Default Description
seconds int or float, optional None Wait for the specified time for all tracking calls to finish before stopping the connection. If None, wait for all tracking calls to finish.

Examples

If you initializing the connection from a Python script, Neptune stops it automatically when the script finishes executing.

import neptune

project = neptune.init_project(project="ml-team/classification")

[...] # Your code

# stop() is automatically called at the end for every Neptune object

Using with statement and context manager:

for project_identifier in projects:
    with neptune.init_project(project=project_identifier) as project:
        [...] # Your code
        # stop() is automatically called
        # when code execution exits the with statement

sync()#

Synchronizes the local representation of the project with Neptune servers.

Parameters

Name Type Default Description
wait Boolean, optional False By default, tracked metadata is sent to the server in the background. With this option set to True, Neptune first sends all data before executing the call. See Connection modes.

wait()#

Wait for all the tracking calls to finish.

Parameters

Name Type Default Description
disk_only Boolean, optional False If True, the process will wait only for the data to be saved locally from memory, but will not wait for it to reach Neptune servers.

Table.to_pandas()#

The Table object is an interim object containing the metadata of fetched objects. To access the data, you need to convert it to a pandas DataFrame by invoking to_pandas().

Returns

Tabular data in the pandas.DataFrame format.

Example

Fetch project "jackie/named-entity-recognition":

import neptune

project = neptune.init_project(
    project="jackie/named-entity-recognition",
    mode="read-only",
)

Fetch all runs metadata as pandas DataFrame:

runs_table_df = project.fetch_runs_table().to_pandas()

Sort runs by creation time:

runs_table_df = runs_table_df.sort_values(by="sys/creation_time", ascending=False)

Extract the ID of the last run:

last_run_id = runs_table_df["sys/id"].values[0]

You can also filter the runs table by state, owner, tag, or a combination of these:

Fetch only inactive runs
runs_table_df = project.fetch_runs_table(state="inactive").to_pandas())
Fetch only runs created by CI service
runs_table_df = project.fetch_runs_table(owner="my_company_ci_service").to_pandas()

``` py title="Fetch only runs that have both "Exploration" and "Optuna" tags" runs_table_df = project.fetch_runs_table(tag=["Exploration", "Optuna"]).to_pandas()

``` py title="You can combine conditions. Runs satisfying all conditions will be fetched"
runs_table_df = project.fetch_runs_table(
    state="inactive", tag="Exploration"
    ).to_pandas()

Fetching the metadata of model versions:

import neptune
model = neptune.init_model(with_id="CLS-PRE")

# Fetch list of all version of the model as pandas DataFrame
model_versions_df = model.fetch_model_versions_table().to_pandas()