System ('sys') namespace

This documentation page is still under development so some of the information may still be missing.

Runs identifier ('sys/id')

Each run has a unique string identifier within your workspace composed of a project key and a high water counter. For example, 3rd run in project Sandbox with a project key 'SAND' will have an identifier 'SAND-3'. You can access the id of your run programmatticaly:

import as neptune
run = neptune.init(project='my_workspace/sandbox')
run_id = run['sys/id'].fetch()

The runs identifier can be later used if want to resume this run or connect to it from multiple processes.

Status ( 'sys/state')

The run can be in two states - Inactive or Active. Active means that at least one process is connected to the run. This may be a process logging training metrics or monitoring performance or this may be a process that's fetching metadata to perform further analysis. Once there has been no activity in the last 2 minutes (typically after the script ends or you invoke .stop()) the run automatically transitions to an Inactive state.

As any run can be paused and resumed multiple times Neptune does not know when it is a success. An Inactive state can mean that the run did not start, the training/monitoring was paused or it did in fact finish training with a success. If you want to more efficiently filter you runs you can set a custom status for each run e.g. run['info/state'] = 'Success' or run['info/state'] = 'Queued' so that it matches your workflow.

Failed ('sys/failed')

The Failed field represents whether the run failed. You can set it manually to mark a run as failed (or reset it to False in case of resuming a failed run).

In addition, Neptune monitors your run and if there is a crash it will automatically set Failed status to True. You can override this behavior by setting fail_on_exception parameter in .init() as False. In both cases, the traceback is captured and appended to 'monitoring/traceback'. If you provided a custom monitoring namespace in .init() the path will be 'monitoring_namespace/traceback'.

If you performed a Remote Abort the Failed status will also be set to True.