Neptune API errors and warnings
When using the Neptune Python client library (Neptune API), you may encounter error and warning situations.
In case of failure, by default, Neptune drops the data with a warning. The training process isn't terminated, unless the NEPTUNE_LOG_FAILURE_ACTION
environment variable is set to raise
.
Validation errors
When using the log_configs()
or log_metrics()
logging methods, the following validation errors can occur:
TypeError
when argument types are mismatched, such as passing a string instead of a float tolog_metrics()
.ValueError
in case of malformed arguments, such as paths that are empty or too long.NeptuneSeriesStepNonIncreasing
indicates a failure in client-side validation when the steps for a given metric are not strictly increasing. For details, seelog_metrics()
.- If configured with the
NEPTUNE_LOG_FAILURE_ACTION
environment variable,NeptuneUnableToLogData
is raised if the main process gets stuck.
OS-level errors
Operating-system level errors are usually non-recoverable and leave the Run
object in an undefined state. For example:
- Failure to enqueue logging operations in the logging methods.
- Failure to update a variable that's shared between processes and used to track in-flight operation status.
Closing a run can fail when an OS-level error occurs, such as failure to terminate a process or clean up resources.
- To synchronize locally stored data with Neptune servers, see
neptune sync
. - To continue an existing run, see Resume a run.
Errors and solutions
This section lists common errors and their solutions.
HTTP response error: HTTPStatus.BAD_REQUEST
The logging API waits indefinitely and prints this warning if your custom run ID is invalid.
If uploading files to Neptune, you might also see the NeptuneFileUploadTemporaryError: A temporary error occurred during file upload
error.
Ensure that the string passed to the run_id
argument of the Run
constructor doesn't contain a forward slash (/
).
NeptuneApiTokenNotProvided
If the client library can't find a Neptune API token via explicit arguments or environment variables.
For configuration help, see API tokens.
NeptuneFailedToFetchClientConfig
This error may occur if your Neptune API token is invalid or expired.
NeptuneProjectNotProvided
If the client library can't find a Neptune project via explicit arguments or environment variables.
For configuration help, see Projects.
NeptuneRunConflicting
If you're resuming a run that was forked off another run, you get the following error:
NeptuneRunConflicting:
Run with specified `run_id` already exists, but has a different `fork_run_id` parameter.
Although the output states that you need to synchronize the data manually, this is not necessary. The data is logged despite the errors.
As a workaround, to resume a fork run without errors, specify the fork parent and step when resuming the run:
from neptune_scale import Run
run = Run(
run_id="SomeExistingRunId",
fork_run_id="OriginalParentId",
fork_step=100,
)
- The parent ID is stored in the run's
sys/forking/parent
attribute. - The fork step is stored in the run's
sys/forking/step
attribute.
NeptuneSynchronizationStopped
or NeptuneConnectionLostError
These exceptions might be raised because of the SSL configuration.
For how to allow self-signed certificates, see Self-Signed Certificate (SSL) environment variables.
Out-of-Memory errors when launching JAX
Neptune uses the spawn method to launch multiprocessing workers. This can cause OOM errors when JAX is initialized during import.
Import JAX after the child process is created, either:
- as part of the worker function (preferred)
- after the Neptune
Run
is initialized