Skip to content

Best practices#

By following our best practices, you ensure smooth and secure working with Neptune.

Configuring credentials#

Best practice

Save your Neptune API token and project name as environment variables, rather than placing them in plain text in your model training script.

While it's possible to pass your API token and project name as arguments to the init functions, it's both more secure and more convenient to save them as environment variables in your system.

This way, Neptune picks your credentials up automatically and you avoid storing them in the source code of your training scripts.

Field limits#

Best practice

  • Make sure you create no more than 9000 fields in a single Neptune object.
  • Avoid creating more unique fields than necessary.

To ensure optimal performance, the number of user-defined fields is limited to 9000. This limit concerns any one Neptune object that can contain metadata fields (Run and Project).

For collections or sequences of data points, it's a good idea to make use of Neptune's field types for series or sets. This makes the metadata easier to search for and operate on.

Example: With the append() function, you can store sequences of values or files under a single field in the run structure. Rather than assigning 100 values to 100 fields (run["field1"] = 0, run["field2"] = 1, ..., ) you could construct a series of those 100 values and log them under a single field with the following code:

for i in range(100):
    run["field"].append(i)

Related

Monitoring system metrics#

Best practice

  • Consider using a custom monitoring namespace name: monitoring/YourOptionalCustomPart.
  • In distributed runs, ensure that each process logs the metrics into its own monitoring namespace.

By default, Neptune tracks system metrics separately for each process:

run
|-- monitoring
    | -- <hash>
        |-- cpu
        |-- gpu
        |-- gpu_memory
        |-- gpu_power
        |-- hostname
        |-- memory
        |-- pid       # Process ID
        |-- stderr
        |-- stdout
        |-- tid       # Thread ID
    |-- <hash>
        |-- cpu
        |-- gpu
        |-- ...

We recommend overriding this default behavior. Otherwise, a unique set of fields is created for each run or process, which can make it difficult to compare the metrics between runs.

For best results, set a custom monitoring namespace name that reflects your setup:

import neptune

run = neptune.init_run(monitoring_namespace="monitoring") # (1)!
  1. You can optionally include custom sub-namespaces:

    monitoring_namespace="monitoring/system_metrics"
    

You can also disable monitoring entirely:

run = neptune.init_run(
    capture_hardware_metrics=False,
    capture_stdout=False,
    capture_stderr=False,
    capture_traceback=False,
)

For details, see Log system metrics.

Reserved namespaces#

Best practice

In the monitoring namespace, only create fields related to hardware consumption, console logs, and other system metrics.

monitoring#

  • Namespace location: run["monitoring"] by default

You can log custom metrics to the monitoring namespace, but it's intended specifically for system monitoring rather than model monitoring.

Examples of system monitoring include hardware consumption and standard streams (stdout and stderr). For details, see Logging system metrics.

sys#

  • Namespace location: run["sys"]

The system namespace is reserved. That is, you can't create new fields in this namespace.

However, you can manually edit certain field values, such as tags or description.

For details, see System namespace overview.

Optimizing logging calls#

Sending too many requests to the Neptune server too frequently might result in a delay. You can avoid it by optimizing your logging calls.

Stopping runs#

Best practice

When you're done logging to a Neptune run, stop it with the stop() method in your code.

When a script finishes executing, Neptune automatically stops any active runs.

However, in interactive sessions (such as Jupyter notebooks) the connection to a run is stopped only when the kernel stops.

To avoid logging metadata for longer than intended, it's best to stop each Neptune object explicitly with the stop() method.

Stopping a run
import neptune

run = neptune.init_run()

run["namespace/field"] = "some metadata"
...
run.stop()

Other cases for using stop() include (but aren't limited to):

  • Creating multiple runs in a single script
  • Continuous training flows
  • Model registry flows
Enabling background monitoring in interactive sessions

To mitigate undesired background logging, in interactive sessions, some monitoring options are disabled by default.

You can turn them on with:

import neptune

run = neptune.init_run(
    capture_hardware_metrics=True,
    capture_stderr=True,
    capture_stdout=True,
)

Fetching metadata logged within the same script#

Best practice

If you fetch metadata that you logged earlier in the same script, call wait() before querying.

When working in asynchronous mode (default), the metadata you track is periodically synchronized with the servers in the background. Because of this, the data may not be immediately available to fetch from the server, even if it appears in the local representation.

To work around that, you can use the wait() method:

run["learning_rate"] = 0.5

# Wait for the data to be synchronized
run.wait()

# Fetching the metadata we just logged will now work
learning_rate = run["learning_rate"].fetch()

Working in complex setups#

The following sections cover best practices and considerations for distributed and parallel computing setups.

Pipelining libraries#

Neptune provides tracking and visualization for pipelining libraries such as KubeFlow. To ensure cohesion, you generally just need to make sure that all the pipeline steps log the data to the same Neptune run.

To access the same run object in multiple scripts, you have the following options:

  • Set a custom run ID

    • You can create a custom identifier for the run and use that to access the same run from multiple locations.
    • You can also export the custom run ID as an environment variable (NEPTUNE_CUSTOM_RUN_ID). This tells Neptune that scripts started with the same NEPTUNE_CUSTOM_RUN_ID value should be treated as one and the same run.
  • Pass the run object between functions - you can use the Run object as a parameter in functions within the same script or imported from other scripts.

In addition, you can use custom namespaces to organize logged metadata into meaningful steps of your pipeline.

Parallel computing#

The Neptune client library is thread-safe. This means that you can log data to your run from multiple threads within a one Python process.

Within one Python script, calls that log metadata from a model-training run are executed in "First In, First Out" (FIFO) order. However, to avoid potential race condition issues, we advise against modifying a variable from multiple threads.

Info

You can track metadata to several runs at the same time – for example, from a Jupyter notebook.

Each run has a separate FIFO queue and a separate synchronization thread.

Distributed computing#

You can log metadata from several processes that run on the same or different machines. Neptune is fully compatible with distributed computing frameworks, such as Apache Spark.

Monitoring#

The main consideration is how you handle monitoring of system metrics. To optimize monitoring for your training setup, you may want to override the default behavior.

To learn more, see Log system metrics.

Data model and synchronization#

Neptune provides some synchronization methods that help you handle more sophisticated workflows:

  • wait() – wait for all the queued logging calls to finish before executing the call. See below for an example.
  • sync() – synchronize the local representation of the run with the Neptune servers.

Neptune is optimized for rapid metadata tracking and doesn't perform API calls to Neptune servers if not needed. As such, Neptune stores its own local representation of the run structure and assumes that no other process is modifying the run at the same time in a conflicting way.

Examples of such conflicting cases would be:

  • Removing or renaming variables or fields that are being used by another process.
  • Using the same field name with a different field type. Once a field is created, you can't change its type:1

    run["predictions"].upload(preds)
    run["predictions"] = "some nice predictions"
    # Error: A string can't be assigned to a file field.
    
    run["accuracy"].append(0.95)
    run["accuracy"] = 0.99
    # Error: FloatSeries does not support the "=" operator. Use append() instead.
    

In the case of a conflict, logging functions throw an exception and some of the tracked data may not be stored on Neptune servers.

Like in parallel computing, to avoid potential race condition issues, we advise against modifying a variable from multiple processes.

Learn more


  1. To change the type of a field, you must overwrite it deliberately. For details, see Overwrite logged metadata