Project
#
Representation of a Neptune project.
If you want to create a new Neptune project, see management.create_project()
.
Initialization#
Initialize with the init_project()
function or the class constructor.
If Neptune can't find your project name or API token
As a best practice, you should save your Neptune API token and project name as environment variables:
export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh3Kb8"
export NEPTUNE_PROJECT="ml-team/classification"
You can, however, also pass them as arguments when initializing Neptune:
run = neptune.init_run( # (1)!
api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh3Kb8", # your token here
project="ml-team/classification", # your full project name here
)
-
Also works for
init_model()
,init_model_version()
, andinit_project()
. -
API token: In the bottom-left corner, expand the user menu and select Get my API token.
- Project name: in the top-right menu: → Edit project details.
If you haven't registered, you can also log anonymously to a public project (make sure not to publish sensitive data through your code!):
You can use the project
object to:
- Retrieve information about runs, models, and model versions within the project.
- Store and retrieve metadata on a project level, such as information about datasets, links to documentation, and key project metrics.
This metadata is displayed in the Project metadata section of the Neptune app.
Parameters
Name | Type | Default | Description |
---|---|---|---|
project |
str , optional |
None |
Name of a project in the form workspace-name/project-name . If None , the value of the NEPTUNE_PROJECT environment variable is used. |
api_token |
str , optional |
None |
Your Neptune API token (or a service account's API token). If None , the value of the NEPTUNE_API_TOKEN environment variable is used.
|
mode |
str , optional |
async |
Connection mode in which the logging will work. Possible values are async , sync , read-only , and debug .If you leave it out, the value of the |
flush_period |
float , optional |
5 (seconds) |
In asynchronous (default) connection mode, how often Neptune should trigger disk flushing. |
proxies |
dict , optional |
None |
Argument passed to HTTP calls made via the Requests library. For details on proxies, see the Requests documentation. |
async_lag_callback |
NeptuneObjectCallback , optional |
None |
Custom callback function which is called if the lag between a queued operation and its synchronization with the server exceeds the duration defined by async_lag_threshold . The callback should take a Project object as the argument and can contain any custom code, such as calling stop() on the object.Note: Instead of using this argument, you can use Neptune's default callback by setting the |
async_lag_threshold |
float , optional |
1800.0 (seconds) |
Duration between the queueing and synchronization of an operation. If a lag callback (default callback enabled via environment variable or custom callback passed to the async_lag_callback argument) is enabled, the callback is called when this duration is exceeded. |
async_no_progress_callback |
NeptuneObjectCallback , optional |
None |
Custom callback function which is called if there has been no synchronization progress whatsoever for the duration defined by async_no_progress_threshold . The callback should take a Project object as the argument and can contain any custom code, such as calling stop() on the object.Note: Instead of using this argument, you can use Neptune's default callback by setting the |
async_no_progress_threshold |
float , optional |
300.0 (seconds) |
For how long there has been no synchronization progress. If a no-progress callback (default callback enabled via environment variable or custom callback passed to the async_no_progress_callback argument) is enabled, the callback is called when this duration is exceeded. |
Returns
Project
object that can be used to interact with the project as a whole, like logging or fetching project-level metadata.
Example
from neptune import Project
project = Project(
project="ml-team/classification",
mode="read-only",
)
Field lookup: []
#
You can access the field of a project
object through a dict-like field lookup: project[field_path]
.
This way, you can
-
store metadata (displayed in the Project metadata section):
-
fetch already logged metadata – for example, to have the single source of truth when starting a new run:
Returns
The returned type depends on the field type and whether a field is stored under the given path.
Path | Example | Returns |
---|---|---|
Field exists | - | The returned type matches the type of the field |
Field does not exist | - | Handler object |
Path is namespace and has field | Path: Field |
Namespace handler object |
Example
Note on collaboration
The project
object follows the same logic as other Neptune objects: If you assign a new value to an existing field, the new value overwrites the previous one.
In a given project, you always initialize and work with the same project
object, so take care not to accidentally overwrite each other's entries if your team is collaborating on project metadata.
Tip: Recall that the append()
method appends the logged value to a series. It works for text strings as well as numerical values.
project["general"] = "Project deadline: 1849-06-30"
# Overwrite the value of the existing string field
project["general"] = "Project deadline: 2049-06-30"
project["train/logs"].append("ML model building, day 1:")
# Continue logging to existing series field
project["train/logs"].append("A model-building project is born")
If you access a namespace handler, you can interact with it like any other Neptune object.
- Stores
"2049-06-30"
under the fieldinfo/deadline
.
Assignment: =
#
Convenience alias for assign()
.
assign()
#
Assign values to multiple fields from a dictionary. You can use this method to store multiple pieces of metadata with a single command.
Parameters
Name | Type | Default | Description |
---|---|---|---|
value |
dict |
None |
A dictionary with values to assign, where keys (str ) become the paths of the fields. The dictionary can be nested, in which case the path will be a combination of all keys. |
wait |
Boolean , optional |
False |
By default, logged metadata is sent to the server in the background. With this option set to True , Neptune first sends all data before executing the call. See Connection modes. |
Example
general_info = {"brief": URL_TO_PROJECT_BRIEF, "deadline": "2049-06-30"}
project["general"] = general_info
You can also explicitly log parameters one by one:
Dictionaries can be nested:
The above logs the URL under the path general/brief/url
.
del
#
Completely removes the field or namespace and all associated metadata stored under the path.
See also: pop()
.
Examples
Assuming there is a field datasets/v0.4
, the following will delete it:
import neptune
project = neptune.init_project(project="ml-team/classification")
del project["datasets/v0.4"]
You can also delete the whole namespace:
exists()
#
Checks if there is a field or namespace under the specified path.
Info
This method checks the local representation of the project. If the field was created by another process or the metadata has not reached the Neptune servers, it may not be possible to fetch. In this case you can:
- Call
sync()
on theproject
object to synchronize the local representation with the server. - Call
wait()
on theproject
object to wait for all logging calls to finish.
Parameters
Name | Type | Default | Description |
---|---|---|---|
path |
str |
- | Path to check for the existence of a field or namespace |
Examples
import neptune
project = neptune.init_project(project="ml-team/classification")
# If an old dataset exists, remove it
if project.exists("dataset/v0.4"):
del project["dataset/v0.4"]
Info
When working in asynchronous (default) mode, the metadata you track may not be immediately available to fetch from the server, even if it appears in the local representation.
To work around this, you can call wait()
on the project
object.
import neptune
project = neptune.init_project(project="ml-team/classification")
project["general/brief"] = URL_TO_PROJECT_BRIEF
# The path exists in the local representation
if project.exists("general/brief"):
# However, the tracking call may have not reached Neptune servers yet
project["general/brief"].fetch() # Error: the field does not exist
project.wait()
fetch()
#
Fetches the values of all single-value fields (that are not of type File
) as a dictionary.
The result preserves the hierarchical structure of the model metadata.
Returns
dict
containing the values of all non-File
single-value fields.
Example
Fetch all the project metrics (assuming they have been logged to a field metrics
):
import neptune
project = neptune.init_project(project="ml-team/classification")
project_metrics = project["metrics"].fetch()
fetch_models_table()
#
Retrieve the models of the project, up to a maximum of 10 000.
Parameters
Name | Type | Default | Description |
---|---|---|---|
columns |
list[str] , optional |
None |
Names of columns to include in the table, as a list of namespace or field names. The Neptune ID ( None , all the columns of the models table are included. |
Returns
An interim Table
object containing Model
objects.
Use to_pandas()
to convert it to a pandas DataFrame.
Example
>>> import neptune
>>> project = neptune.init_project(project="ml-team/nlu", mode="read-only")
https://app.neptune.ai/ml-team/nlu/
Remember to stop your project...
>>> models_table_df = project.fetch_models_table().to_pandas()
>>> print(models_table_df)
sys/creation_time sys/id sys/modification_time ...
0 2022-08-26 05:06:19.693000+00:00 NLU-FOREST 2022-08-26 05:06:20.944000+00:00 ...
1 2022-08-25 08:15:13.678000+00:00 NLU-TREE 2022-08-25 08:16:23.179000+00:00 ...
>>> filtered_models_table = project.fetch_models_table(
... columns=["sys/size", "model/size_limit"]
... )
>>> filtered_models_df = filtered_models_table.to_pandas()
>>> print(filtered_models_df)
sys/id sys/size model/size_limit
0 NLU-FOREST 415.0 50.0
1 NLU-TREE 387.0 50.0
...
models_table_df = models_table_df.sort_values(
by="sys/creation_time",
ascending=False,
)
fetch_runs_table()
#
Retrieve runs matching the specified criteria, up to a maximum of 10 000.
All parameters are optional. Each of them specifies a single criterion. Only runs matching all of the criteria will be returned.
Parameters
Name | Type | Default | Description |
---|---|---|---|
id |
str or list[str] , optional |
None |
Neptune ID of a run, or a list of multiple IDs. Example: "NLU-1" or ["NLU-1", "NLU-2"] .Matching any element of the list is sufficient to pass the criterion. |
state |
str or list[str] , optional |
None |
State or list of states. Possible values: "inactive" , "active" . "Active" means that at least one process is connected to the run.Matching any element of the list is sufficient to pass the criterion. For details, see |
owner |
str or list[str] , optional |
None |
Username of the run owner, or list of multiple owners. Example: "josh" or ["frederic", "josh"] . The owner is the user who created the run.Matching any element of the list is sufficient to pass the criterion. |
tag |
str or list[str] , optional |
None |
A tag or list of tags. Example: "lightGBM" or ["pytorch", "cycleLR"] .Note: Only runs that have all specified tags will pass this criterion. |
columns |
list[str] , optional |
None |
Names of columns to include in the table, as a list of namespace or field names. The Neptune ID ( None , all the columns of the runs table are included. |
Returns
An interim Table
object containing run
objects. Use to_pandas()
to convert it to a pandas DataFrame.
Examples
>>> import neptune
>>> project = neptune.init_project(project="ml-team/nlu", mode="read-only")
https://app.neptune.ai/ml-team/nlu/
Remember to stop your project...
>>> runs_table_df = project.fetch_runs_table().to_pandas()
>>> print(runs_table_df)
sys/creation_time sys/description sys/failed ...
0 2022-08-26 07:28:42.673000+00:00 False ...
1 2022-08-26 07:18:41.321000+00:00 False ...
2 2022-08-26 07:07:20.338000+00:00 False ...
3 2022-08-26 05:36:39.615000+00:00 False ...
>>> filtered_runs_table = project.fetch_runs_table(
... columns=["f1", "sys/running_time"]
... )
>>> filtered_runs_df = filtered_runs_table.to_pandas()
>>> print(filtered_runs_df)
sys/id sys/running_time f1
0 NLU-8 5.436 0.95
1 NLU-7 12.342 0.92
2 NLU-6 318.538 0.80
3 NLU-5 9.560 0.80
...
runs_table_df = runs_table_df.sort_values(
by="sys/creation_time",
ascending=False,
)
You can also filter the runs by state, owner, or tag.
runs_table_df = project.fetch_runs_table(
owner="my_company_ci_service@ml-team").to_pandas()
runs_table_df = project.fetch_runs_table(
tag=["Exploration", "Optuna"]).to_pandas()
runs_table_df = project.fetch_runs_table(
state="inactive", tag="Exploration").to_pandas()
get_structure()
#
Returns the metadata structure of a project
object in the form of a dictionary.
This method can be used to traverse the metadata structure programmatically when using Neptune in automated workflows.
See also: print_structure()
.
Warning
The returned object is a shallow copy of the internal structure. Any modifications to it may result in tracking malfunction.
Returns
dict
with the project metadata structure.
Example
>>> import neptune
>>> project = neptune.init_project(project="ml-team/classification")
>>> project.get_structure()
{'general': {'brief': <neptune.attributes.atoms.string.String object at 0x000001C8EF7A5BD0>, 'deadline': <neptune.attributes.atoms.string.String object at 0x000001C8EF7A66B0>, ... }}
get_url()
#
Returns a direct link to the project in Neptune. The same link is printed in the console once the project
object has been initialized.
Returns
str
with the URL of the project in Neptune.
Example
>>> import neptune
>>> project = neptune.init_project(project="ml-team/classification")
>>> project.get_url()
https://app.neptune.ai/ml-team/classification/
pop()
#
Completely removes the field or namespace and all associated metadata stored under the path.
See also del
.
Parameters
Name | Type | Default | Description |
---|---|---|---|
path |
str |
- | Path of the field or namespace to be removed. |
wait |
Boolean , optional |
False |
By default, logged metadata is sent to the server in the background. With this option set to True , Neptune first sends all data before executing the call. See Connection modes. |
Examples
Assuming there is a field datasets/v0.4
, the following will delete it:
import neptune
project = neptune.init_project(project="ml-team/classification")
project.pop("datasets/v0.4")
You can invoke pop()
directly on both fields and namespaces.
# The following line
project.pop("datasets/v0.4")
# is equiavlent to this line
project["datasets/v0.4"].pop()
# and this line
project["datasets"].pop("v0.4")
You can also batch-delete the whole namespace:
print_structure()
#
Pretty-prints the structure of the project metadata. Paths are ordered lexicographically and the structure is colored.
See also: get_structure()
.
Example
>>> import neptune
>>> project = neptune.init_project(project="ml-team/classification")
>>> project.print_structure()
'general':
'brief': String
'deadline': String
'sys':
'creation_time': Datetime
'id': String
'modification_time': Datetime
'monitoring_time': Integer
'name': String
'ping_time': Datetime
'running_time': Float
'size': Float
'state': RunState
'visibility': String
stop()
#
Stops the connection to Neptune and synchronizes all data.
When using context managers, Neptune automatically calls stop()
when exiting the project
context.
Warning
Always call stop()
in interactive environments, such as a Python interpreter or Jupyter notebook. The connection to Neptune is not stopped when the cell has finished executing, but rather when the entire notebook stops.
If you're running a script, the connection is stopped automatically when the script finishes executing. However, it's a best practice to call stop()
when the connection is no longer needed.
Parameters
Name | Type | Default | Description |
---|---|---|---|
seconds |
int or float , optional |
None |
Wait for the specified time for all logging calls to finish before stopping the connection. If None , wait for all logging calls to finish. |
Examples
If you're initializing the connection from a Python script, Neptune stops it automatically when the script finishes execution.
import neptune
project = neptune.init_project(project="ml-team/classification")
[...] # Your code
# stop() is automatically called at the end for every Neptune object
Using with
statement and context manager:
for project_identifier in projects:
with neptune.init_project(project=project_identifier) as project:
[...] # Your code
# stop() is automatically called
# when code execution exits the with statement
sync()
#
Synchronizes the local representation of the project with Neptune servers.
Parameters
Name | Type | Default | Description |
---|---|---|---|
wait |
Boolean , optional |
False |
By default, logged metadata is sent to the server in the background. With this option set to True , Neptune first sends all data before executing the call. See Connection modes. |
wait()
#
Wait for all the logging calls to finish.
Parameters
Name | Type | Default | Description |
---|---|---|---|
disk_only |
Boolean , optional |
False |
If True , the process will wait only for the data to be saved locally from memory, but will not wait for it to reach Neptune servers. |
Table.to_pandas()
#
The Table
object is an interim object containing the metadata of fetched objects. To access the data, you need to convert it to a pandas DataFrame by invoking to_pandas()
.
Returns
Tabular data in the pandas.DataFrame
format.
Example
Fetch project "jackie/named-entity-recognition":
import neptune
project = neptune.init_project(
project="jackie/named-entity-recognition",
mode="read-only",
)
Fetch all runs metadata as pandas DataFrame:
Sort runs by creation time:
Extract the ID of the last run:
You can also filter the runs table by state, owner, or tag.
runs_table_df = project.fetch_runs_table(owner="my_company_ci_service").to_pandas()
runs_table_df = project.fetch_runs_table(tag=["Exploration", "Optuna"]).to_pandas()
runs_table_df = project.fetch_runs_table(
state="inactive", tag="Exploration"
).to_pandas()
Fetch model versions as table: