Skip to main content
App version: 3.20250901

Neptune API error handling in callbacks

The Run object exposes four parameters that serve as callbacks for various error or warning scenarios:

on_network_error_callback

Handles low-level network errors that occur during HTTP requests. These errors include:

  • Read, write, and connect timeouts
  • Malformed requests
  • Connection failures

Note: This callback is called only when the retry mechanism fails.

on_warning_callback

Called in a few specific scenarios:

  • You're creating a run with an ID that already exists.
  • You're trying to fork a run which doesn't exist.
  • You're sending a point to a metric which is exactly the same as the latest point in this metric.

on_error_callback

Umbrella callback for various issues. Includes a mix of error classes:

  • API authorization errors. For example, you don't have permissions to write to a project.
  • Errors in the lifecycle of the local process synchronizing data to Neptune. For example, the process exited unexpectedly.
  • Semantic errors, such as:
    • You're trying to write to a run that doesn't exist.
    • You're trying to fork a run, but its parent doesn't exist.
    • You're trying to create a run, but the creation parameters are invalid.
    • You're trying to write a point to a metric with non-increasing step or timestamp.

on_queue_full_callback (unused)

This additional parameter is currently unused.

Override the default error handling

We recommend to always provide error handling callbacks explicitly. There are two main directions to optimize for:

  • A) Never stop the training process, even at the cost of some data not appearing in Neptune.

    In this case, we recommend setting all callbacks to something like:

    def _my_callback(exc: BaseException, ts: Optional[float]) -> None:
    logger.warning(f"Encountered {exc} error")
  • B) Ensure correct and complete data, even at the cost of stopping the training process.

    This scenario is more complex and requires handling exceptions on case-by-case basis.

    Example

    def _my_callback(exc: BaseException, ts: Optional[float]) -> None:
    if isinstance(exc, NeptuneSynchronizationStopped):
    # The process synchronizing logged data to Neptune exited
    run.terminate()

    elif isinstance(exc, NeptuneFloatValueNanInfUnsupported):
    # We're trying to log NaN/Inf, which is currently not supported
    logger.warning(f"Failed to log NaN/Inf metric value")

    ...

    else:
    run.terminate()

For the full set of possible exceptions, see the source code on GitHub.