Skip to content

Kedro integration guide#

Kedro is a popular open source project that helps standardize ML workflows. The Kedro–Neptune plugin adds a powerful and flexible UI on top of your Kedro pipelines:

  • Browse, filter, and sort your model training runs.
  • Compare nodes and pipelines on metrics and visual node outputs.
  • Display pipeline metadata, including:
    • Learning curves for metrics, plots, and images.
    • Rich media, such as video and audio.
    • Interactive visualizations from Plotly, Altair, or Bokeh.

The Kedro-Neptune plugin supports distributed pipeline execution. It works in Kedro setups that use orchestrators, such as Airflow or Kubeflow.

Kedro pipeline metadata visualized in Neptune

See Kedro example in Neptune  Code examples 

Before you start#

Installing the plugin#

To use your preinstalled version of Neptune together with the integration:

pip
pip install -U kedro-neptune
conda
conda install -c conda-forge kedro-neptune

To install both Neptune and the integration:

pip
pip install -U "neptune[kedro]"
conda
conda install -c conda-forge neptune kedro-neptune
Passing your Neptune credentials

Once you've registered and created a project, set your Neptune API token and full project name to the NEPTUNE_API_TOKEN and NEPTUNE_PROJECT environment variables, respectively.

export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...6Lc"

To find your API token: In the bottom-left corner of the Neptune app, expand the user menu and select Get my API token.

export NEPTUNE_PROJECT="ml-team/classification"

Your full project name has the form workspace-name/project-name. You can copy it from the project settings: Click the menu in the top-right → Edit project details.

On Windows, navigate to SettingsEdit the system environment variables, or enter the following in Command Prompt: setx SOME_NEPTUNE_VARIABLE 'some-value'


While it's not recommended especially for the API token, you can also pass your credentials in the code when initializing Neptune.

run = neptune.init_run(
    project="ml-team/classification",  # your full project name here
    api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh...3Kb8",  # your API token here
)

For more help, see Set Neptune credentials.

Setup and logging example#

  1. Create a Kedro project from the spaceflights starter.

    1. To create a Kedro starter project, enter the following on the command line or in a terminal app:

      kedro new --starter=spaceflights-pandas
      

      For detailed instructions, see the Kedro docs .

    2. Follow the instructions and choose a name for your Kedro project.

    3. Navigate to your new Kedro project directory.
    Example structure
    spaceflights-pandas
    ├── conf
        ├── base
            ├── catalog.yml
            ├── neptune.yml
            ...
        ├── local
            ├── credentials_neptune.yml
        ... 
    ├── data
    ├── docs
    ├── notebooks
    ├── src
        └── spaceflights-pandas
            ├── pipelines
                ├── data_processing
                ├── data_science
                    ├── nodes.py
                    ├── pipeline.py
                    ...
                ...
            ├── settings.py
            ...        
    ...
    

    In this example, we'll use catalog.yml, neptune.yml, credentials_neptune.yml, nodes.py, pipeline.py, and settings.py.

  2. Initialize the Kedro-Neptune plugin.

    1. In your Kedro project directory, enter the kedro neptune init command:

      kedro neptune init
      

      Tip

      You can log dependencies with the dependencies parameter. Either pass the path to your requirements file, or pass infer to have Neptune log the currently installed dependencies in your environment.

      kedro neptune init --dependencies infer
      
    2. You are prompted for your API token.

      • If you've saved it to the NEPTUNE_API_TOKEN environment variable (recommended), just press Enter without typing anything.
      • If you've set your Neptune API token to a different environment variable, enter that instead. For example, MY_SPECIAL_NEPTUNE_TOKEN_VARIABLE.
      • You can also copy and enter your Neptune API token directly (not recommended).
    3. You are prompted for a Neptune project name.

      • If you've saved it to the NEPTUNE_PROJECT environment variable (recommended), just press Enter without typing anything.
      • If you've set your Neptune API token to a different environment variable, enter that instead. For example, MY_SPECIAL_NEPTUNE_PROJECT_VARIABLE.
      • You can also enter your Neptune project name in the form workspace-name/project-name. If you're not sure about the full name, go to the project settings in the Neptune app.

    If everything was set up correctly...

    You should see the following message:

    Created credentials_neptune.yml configuration file: YOUR_KEDRO_PROJECT\conf\local\credentials_neptune.yml
    Creating neptune.yml configuration file in: YOUR_KEDRO_PROJECT\conf\base\neptune.yml
    Creating catalog_neptune.yml configuration file: YOUR_KEDRO_PROJECT\conf\base\catalog_neptune.yml
    
  3. Add the config patterns needed to load the Neptune config to your project's CONFIG_LOADER_ARGS in spaceflights-pandas\src\spaceflights\settings.py:

    settings.py
    CONFIG_LOADER_ARGS = {
        ...,
        "config_patterns": {
            ...,
            "credentials_neptune" : ["credentials_neptune*"],
            "neptune": ["neptune*"],
        }
    }
    
  4. Add Neptune logging to a Kedro node.

    1. Open a pipeline node:

      spaceflights-pandas\src\spaceflights\pipelines\data_science\nodes.py

    2. In the nodes.py file, import Neptune:

      nodes.py
      import neptune
      
    3. Add the neptune_run argument of type neptune.handler.Handler to the evaluate_model() function:

      nodes.py
      def evaluate_model(
          regressor: LinearRegression,
          X_test: pd.DataFrame,
          y_test: pd.Series,
          neptune_run: neptune.handler.Handler,
      ):
      ...
      

      Tip

      You can treat neptune_run like a normal Neptune run and log metadata to it as you normally would.

      You must use the special string neptune_run as the run handler in Kedro pipelines.

    4. Log metrics like score to neptune_run:

      nodes.py
      def evaluate_model(
          regressor: LinearRegression,
          X_test: pd.DataFrame,
          y_test: pd.Series,
          neptune_run: neptune.handler.Handler,
      ):
          y_pred = regressor.predict(X_test)
          score = r2_score(y_test, y_pred)
          logger = logging.getLogger(__name__)
          logger.info("Model has a coefficient R^2 of %.3f on test data.", score)
          neptune_run["nodes/evaluate_model_node/score"] = score
      

      The nodes/evaluate_model_node/score structure is an example. You can define your own.

    5. Log images:

      nodes.py
      import matplotlib.pyplot as plt
      ...
      
      def evaluate_model(
          regressor: LinearRegression,
          X_test: pd.DataFrame,
          y_test: pd.Series,
          neptune_run: neptune.handler.Handler,
      ):
          y_pred = regressor.predict(X_test)
          score = r2_score(y_test, y_pred)
          logger = logging.getLogger(__name__)
          logger.info("Model has a coefficient R^2 of %.3f on test data.", score)
      
          fig = plt.figure()
          plt.scatter(y_test.values, y_pred, alpha=0.2)
          plt.xlabel("Actuals")
          plt.ylabel("Predictions")
      
          neptune_run["nodes/evaluate_model_node/score"] = score
          neptune_run["nodes/evaluate_model_node/actual_vs_prediction"].upload(fig)
      

      You can also log other other types of metadata in a structure of your own choosing. For details, see What you can log and display.

  5. Add the Neptune run handler to the Kedro pipeline.

    1. Go to a pipeline definition:

      spaceflights-pandas\src\spaceflights\pipelines\data_science\pipeline.py

    2. Add the neptune_run handler as an input to the evaluate_model node:

      pipeline.py
      def create_pipeline(**kwargs) -> Pipeline:
          return pipeline(
              [
                  ...
                  ,
                  node(
                      func=evaluate_model,
                      inputs=["regressor", "X_test", "y_test", "neptune_run"],
                      outputs=None,
                      name="evaluate_model_node",
                  ),
              ]
          )
      
  6. On the command line, execute the Kedro project:

    kedro run
    

    To open the Neptune run in the app, click the Neptune link that appears in the console.

    Example link: https://app.neptune.ai/o/showcase/org/kedro/e/KED-1

Now you can explore the results in the Neptune app.

  • In the All metadata section of the run view, click the kedro namespace to view the metadata from the Kedro pipelines.
  • kedro/catalog/datasets: dataset metadata
  • kedro/catalog/parameters: pipeline and node parameters
  • kedro/nodes: details of all the nodes in the project
  • kedro/nodes/evaluate_model_node/score: the \(R^2\) value we logged
  • kedro/run_params: execution parameters
  • kedro/structure: JSON structure of the kedro pipeline

See in Neptune 

More options#

Basic logging configuration#

You can configure where and how Kedro pipeline metadata is logged by editing the conf/base/neptune.yml file.

conf/base/neptune.yml
neptune:
#GLOBAL CONFIG
  project: $NEPTUNE_PROJECT
  base_namespace: kedro
  dependencies:
  enabled: true

#LOGGING
  upload_source_files:
  - '**/*.py'
  - conf/base/*.yml

where

  • project is the name of the Neptune project where metadata is stored. If you haven't set the $NEPTUNE_PROJECT environment variable, you can replace it with the full name of your Neptune project (workspace-name/project-name).
  • base_namespace is the base namespace (folder) where your metadata will be logged. The default is kedro.
  • dependencies determines if your python environment requirements will be logged. You can pass the path to your requirements file, or pass infer to have Neptune log the currently installed dependencies in your environment. Dependencies won't be logged by default.
  • enabled determines if Neptune logging is enabled. The default is true.

    Disabling Neptune

    To turn off Neptune logging in a Kedro project, see Disabling Neptune.

  • upload_source_files is used to list files you want to automatically upload to Neptune when a run is created.

Configuring Neptune API token#

You can configure how the Kedro-Neptune plugin will look for your Neptune API token in the conf/local/credentials_neptune.yml file.

conf/local/credentials_neptune.yml
neptune:
  api_token: $NEPTUNE_API_TOKEN

You can:

  • leave it empty, in which case Neptune will look for your token in the $NEPTUNE_API_TOKEN environment variable.

    Important

    There must be a dollar sign ($) before the variable name.

  • pass a different environment variable, such as $MY_SPECIAL_NEPTUNE_API_TOKEN_VARIABLE.

  • pass your token as a string, such as eyJhcGlfYWRk123cmVqgpije5cyI6Imh0dHBzOi8v (not recommended).
How do I find my API token?

In the bottom-left corner of the Neptune app, open the user menu and select Get your API token.

You can copy your token from the dialog that opens. It's very long – make sure to copy and paste it in full!

Logging files and datasets#

You can log files to Neptune with a special Kedro Dataset called kedro_neptune.NeptuneFileDataset.

To log files, add the dataset to your conf/base/catalog.yml file:

conf/base/catalog.yml
example_csv_file:
  type: kedro_neptune.NeptuneFileDataset
  filepath: data/01_raw/companies.csv

where filepath is the path to the file you'd like to log. (Do not change the type value.)

You can find and preview all the logged NeptuneFileDatasets in the kedro/catalog/files namespace of the run.

Logging an existing Kedro Dataset#

If you already have a Kedro Dataset that you would log under the same name to Neptune, add @neptune to the Dataset name:

conf/base/catalog.yml
companies:
  type: pandas.CSVDataset
  filepath: data/01_raw/companies.csv

companies@neptune:
  type: kedro_neptune.NeptuneFileDataset
  filepath: data/01_raw/companies.csv

Note

Rather than uploading the whole training dataset to Neptune, upload a small, informative part that you would like to display later.

To upload files to Neptune, you can also do it directly through Neptune API with the upload() and upload_files() methods:

neptune_run["kedro/catalog/datasets/manual/reviews"].upload("data/01_raw/reviews.csv")

You can also track the dataset file as an artifact with the track_files() method. This can be useful particularly if the file is large, or you are mainly interested in tracking its version:

neptune_run["kedro/catalog/datasets/reviews_version"].track_files("data/01_raw/reviews.csv")

Related

See Kedro example in Neptune 

Disabling Neptune#

To disable Neptune in your Kedro project:

  1. Set the enabled flag to false in the neptune.yml file:

    conf/base/neptune.yml
    neptune:
    #GLOBAL CONFIG
      project: $NEPTUNE_PROJECT
      base_namespace: kedro
      dependencies: 
      enabled: false
    
  2. Wrap manual logging code within Kedro nodes inside a condition to log only when neptune_run exists:

    nodes.py
    def evaluate_model(
        regressor: LinearRegression,
        X_test: pd.DataFrame,
        y_test: pd.Series,
        neptune_run: neptune.handler.Handler,
    ):
        y_pred = regressor.predict(X_test)
        score = r2_score(y_test, y_pred)
        logger = logging.getLogger(__name__)
        logger.info("Model has a coefficient R^2 of %.3f on test data.", score)
    
        if neptune_run:
            neptune_run["nodes/evaluate_model_node/score"] = score
            neptune_run["nodes/evaluate_model_node/actual_vs_prediction"].upload(fig)
    

Related