Skip to content

Using Neptune with pipelining libraries#

For pipelining libraries, such as KubeFlow, Neptune provides tracking and visualization.

You generally just need to ensure that all the steps of the pipeline (in practice, scripts) are tracking data to the same Neptune run.

To access the same run object in multiple steps, you have a few options:

  • Passing the run object between files

    • You can use the Run object as a parameter in functions you import from other scripts.
    • This method applies to other Neptune objects as well: Model, ModelVersion, and Project.
  • Setting a custom run ID

    • You can create a custom identifier for the run and use that to access the same run from multiple locations.
    • You can also export the custom run ID as an environment variable (NEPTUNE_CUSTOM_RUN_ID). This tells Neptune that scripts started with the same NEPTUNE_CUSTOM_RUN_ID value should be treated as one and the same run.

On top of that, you might want to use namespaces to organize tracked metadata into meaningful steps of your pipeline.

For example:

# in the preparation script
run["preparation/input_dataset"].upload("test.csv")

# in the training script
run["training/accuracy"].log(0.96)

# in the validation script
run["validation/accuracy"].log(0.89)