Skip to content

Working with Kedro: Setup#

Kedro is a popular open-source project that helps standardize ML workflows. The Kedro-Neptune plugin adds a powerful and flexible UI on top of your Kedro pipelines:

  • Browse, filter, and sort your model training runs.
  • Compare nodes and pipelines on metrics and visual node outputs.
  • Display pipeline metadata, including:
    • Learning curves for metrics, plots, and images.
    • Rich media, such as video and audio.
    • Interactive visualizations from Plotly, Altair, or Bokeh.

The Kedro-Neptune plugin supports distributed pipeline execution. It works in Kedro setups that use orchestrators, such as Airflow or Kubeflow.

Kedro pipeline metadata visualized in Neptune

See Kedro example in Neptune  Code examples 

Related

Before you start#

Installing the Kedro-Neptune plugin#

On the command line or in a terminal app, such as Command Prompt, enter the following:

pip install kedro-neptune
conda install -c conda-forge kedro-neptune

Setup and logging example#

  1. Create a Kedro project from the pandas-iris starter.

    1. To create a Kedro starter project, enter the following on the command line or in a terminal app:

      kedro new --starter=pandas-iris
      

      For detailed instructions, see the Kedro docs .

    2. Follow the instructions and choose a name for your Kedro project.

    3. Navigate to your new Kedro project directory.
    Example structure
    My-Kedro-Project    # Parent directory of the template
    ├── conf            # Project configuration files
    ├── data            # Local project data (not committed to version control)
    ├── docs            # Project documentation
    ├── logs            # Project output logs (not committed to version control)
    ├── notebooks       # Project-related Jupyter notebooks (can be used for experimental code before moving the code to src)
    ├── README.md       # Project README
    ├── setup.cfg       # Configuration options for 'pytest' when doing 'kedro test' and for the 'isort' utility when doing 'kedro lint'
    ├── src             # Project source code
        ├── pipelines   
            ├── data_science
                ├── nodes.py
                ├── pipelines.py
                └── ...
    

    In this example, we'll use the nodes.py and pipelines.py files.

  2. Initialize the Kedro-Neptune plugin.

    1. In your Kedro project directory, enter the command:

      kedro neptune init
      
    2. You are prompted for your API token.

      • If you've saved it to the NEPTUNE_API_TOKEN environment variable (recommended), just press Enter without typing anything.
      • If you've set your Neptune API token to a different environment variable, enter that instead. For example, MY_SPECIAL_NEPTUNE_TOKEN_VARIABLE.
      • You can also copy and enter your Neptune API token directly (not recommended).
    3. You are prompted for a Neptune project name.

      • If you've saved it to the NEPTUNE_PROJECT environment variable (recommended), just press Enter without typing anything.
      • If you've set your Neptune API token to a different environment variable, enter that instead. For example, MY_SPECIAL_NEPTUNE_PROJECT_VARIABLE.
      • You can also enter your Neptune project name in the form workspace-name/project-name. If you're not sure about the full name, go to the project settings in the Neptune app.

    If everything was set up correctly...

    You should see the following:

    • The message kedro-neptune plugin successfully configured
    • Three new files in your Kedro project:
      1. YOUR_KEDRO_PROJECT/conf/local/credentials_neptune.yml (credentials file)
      2. YOUR_KEDRO_PROJECT/conf/base/neptune.yml (config file)
      3. YOUR_KEDRO_PROJECT/conf/base/neptune_catalog.yml (catalog file)
  3. Add Neptune logging to a Kedro node.

    1. Open a pipeline node: src/KEDRO_PROJECT/pipelines/data_science/nodes.py.
    2. In the nodes.py file, import the Neptune client:

      nodes.py
      import neptune.new as neptune
      
    3. Add the neptune_run argument of type neptune.new.handler.Handler to the report_accuracy() function:

      nodes.py
      def report_accuracy(
          predictions: np.ndarray,
          test_y: pd.DataFrame,
          neptune_run: neptune.new.handler.Handler
      ) -> None:
      ...
      

      Tip

      You can treat neptune_run like a normal Neptune run and log metadata to it as you normally would.

      You must use the special string neptune_run as the run handler in Kedro pipelines.

    4. Log metrics like accuracy to neptune_run:

      nodes.py
      def report_accuracy(
          predictions: np.ndarray,
          test_y: pd.DataFrame,
          neptune_run: neptune.new.handler.Handler,
      ) -> None:
          target = np.argmax(test_y.to_numpy(), axis=1)
          accuracy = np.sum(predictions == target) / target.shape[0]
      
          neptune_run["nodes/report/accuracy"] = accuracy * 100
      

      The nodes/report/accuracy structure is an example. You can define your own.

    5. Log images, such as a confusion matrix:

      nodes.py
      def report_accuracy(
          predictions: np.ndarray,
          test_y: pd.DataFrame,
          neptune_run: neptune.new.handler.Handler,
      ) -> None:
          target = np.argmax(test_y.to_numpy(), axis=1)
          accuracy = np.sum(predictions == target) / target.shape[0]
      
          fig, ax = plt.subplots()
          plot_confusion_matrix(target, predictions, ax=ax)
          neptune_run["nodes/report/confusion_matrix"].upload(fig)
      

      You can also log other other types of metadata in a structure of your own choosing. For details, see What you can log and display.

  4. Add the Neptune run handler to the Kedro pipeline.

    1. Go to a pipeline definition: src/KEDRO_PROJECT/pipelines/data_science/pipelines.py
    2. Add the neptune_run handler as an input to the report node:

      pipelines.py
      def create_pipeline(**kwargs):
          return Pipeline(
              [
                  ...
      
                  node(
                      report_accuracy,
                      ["example_predictions", "example_test_y","neptune_run"],
                      None,
                      name="report",
                  ),
              ]
          )
      
  5. On the command line, execute your Kedro pipeline:

    kedro run
    

    To open the Neptune run in the app, click the Neptune link that appears in the console.

    Example link: https://app.neptune.ai/common/kedro-integration/e/KED-632

  6. Explore the results in the Neptune app.

    • In the All metadata section of the run view, click the kedro namespace to view the metadata from the Kedro pipelines.
    • kedro/catalog/parameters: pipeline and node parameters
    • kedro/run_params: execution parameters
    • kedro/catalog/datasets/example_iris_data: dataset metadata
    • kedro/nodes/report/accuracy: the metrics we logged
    • kedro/nodes/report/confusion_matrix: the confusion matrix we logged

See in Neptune 

More options#

Creating a dashboard with Kedro pipeline metadata#

By creating a custom dashboard, you can you can combine all the metadata in a single view.

Example dashboard 

Comparing Kedro pipeline runs#

You can compare run metadata from your Kedro pipelines.

  1. Navigate the Runs tab.
  2. In the left pane, select Compare runs.

Example comparison view 

Basic logging configuration#

You can configure where and how Kedro pipeline metadata is logged by editing the conf/base/neptune.yml file.

conf/base/neptune.yml
neptune:
#GLOBAL CONFIG
  project: $NEPTUNE_PROJECT
  base_namespace: kedro

#LOGGING
  upload_source_files:
  - '**/*.py'
  - 'conf/base/*.yml'

where

  • project is the name of the Neptune project where metadata is stored. If you haven't set the $NEPTUNE_PROJECT environment variable, you can replace it with the full name of your Neptune project (workspace-name/project-name).
  • base_namespace is the base namespace (folder) where your metadata will be logged. The default is kedro.
  • upload_source_files is used to list files you want to upload to Neptune.

Configuring Neptune API token#

You can configure how the Kedro-Neptune plugin will look for your Neptune API token in the conf/local/credentials_neptune.yml file.

conf/local/credentials_neptune.yml
neptune:
  api_token: eyJhcGlfYWRk123cmVqgpije5cyI6Imh0dHBzOi8v

You can:

  • leave it empty, in which case Neptune will look for your token in the $NEPTUNE_API_TOKEN environment variable.

    Important

    There must be a dollar sign ($) before the variable name.

  • pass a different environment variable, such as $MY_SPECIAL_NEPTUNE_API_TOKEN_VARIABLE.

  • pass your token as a string, such as eyJhcGlfYWRk123cmVqgpije5cyI6Imh0dHBzOi8v (not recommended).

Logging files and datasets#

You can log files to Neptune with a special Kedro DataSet called kedro_neptune.NeptuneFileDataSet.

To log files, add the DataSet to your catalog.yml file:

conf/base/catalog.yml
example_csv_file:
  type: kedro_neptune.NeptuneFileDataSet
  filepath: data/01_raw/iris.csv

where filepath is the path to the file you'd like to log. (Do not change the type value.)

You can find and preview all the logged NeptuneFileDatasets in the kedro/catalog/files namespace of the run.

Logging an existing Kedro DataSet#

If you already have a Kedro DataSet that you would log under the same name to Neptune, add @neptune to the DataSet name:

conf/base/catalog.yml
example_iris_data:
  type: pandas.CSVDataSet
  filepath: data/01_raw/iris.csv

example_iris_data@neptune:
  type: kedro_neptune.NeptuneFileDataSet
  filepath: data/01_raw/iris.csv

Note

Rather than uploading the whole training dataset to Neptune, upload a small, informative part that you would like to display later.

To upload files to Neptune, you can also do it directly through Neptune API with the upload() and upload_files() methods:

neptune_run["dataset/example_iris_data"].upload("data/01_raw/iris_sample.csv")

You can also track the dataset file as an artifact with the track_files() method. This can be useful particularly if the file is large, or you are mainly interested in tracking its version:

neptune_run["dataset/iris_data_version"].track_files("data/01_raw/iris.csv")

Related

See Kedro example in Neptune 

Disabling Neptune#

To disable Neptune in your Kedro project, use the DISABLE_HOOKS_FOR_PLUGINS setting in the project's settings.py file:

settings.py
DISABLE_HOOKS_FOR_PLUGINS = ("kedro-neptune",)