Skip to content

Using Neptune in distributed computing#

You can track run metadata from several processes, running on the same or different machines.

Neptune is fully compatible with distributed computing frameworks, such as Apache Spark. Neptune also provides some synchronization methods that will help you handle more sophisticated workflows:

  • wait() – wait for all the metadata tracking calls to finish before executing the call. See below for an example.
  • sync() – synchronize the local representation of the run or other object with the Neptune servers.

Best practices#

Neptune is optimized for rapid metadata tracking and does not perform API calls to Neptune servers if not needed. As such, Neptune stores its own local representation of the run structure and assumes that no other process is modifying the run at the same time in a conflicting way.

Examples of such conflicting cases would be:

  • Using the same field name with a different field type, such as File versus StringSeries.
  • Removing or renaming variables or fields that are being used by another process.

In the case of a conflict, tracking methods will throw an exception and some of the tracked data may not be stored on Neptune servers.

In particular, you should respect the type of a field even within a simple script:

run["learning_rate"] = 0.5
run["learning_rate"] = "let's try a string instead"
# Error: Float variable cannot be assigned with a str.

run["accuracy"].log(0.95)
run["accuracy"] = 0.99
# Error: FloatSeries does not support the "=" operator. Use log() instead.

Like in parallel computing, to avoid potential race condition issues, we advise against modifying a variable from multiple processes.

Related

Fetching logged metadata from within the same script#

When working in asynchronous mode (default), the metadata you track is periodically synchronized with the servers in the background. Because of this, the data may not be immediately available to fetch from the server, even if it appears in the local representation.

To work around that, you can use the wait() method mentioned above.

Rule of thumb

If you fetch metadata that you logged from within the same script, use wait() on the object before querying:

run["learning_rate"] = 0.5

# Wait for the data to be synchronized
run.wait()

# Fetching the metadata we just logged will now work
learning_rate = run["learning_rate"].fetch()
Back to top