Kedro

Learn how to log Kedro pipeline metadata to Neptune.

What will you get with this integration?

Kedro is a popular open-source project that helps standardize ML workflows. It gives you a clean and powerful pipeline abstraction where you put all your ML code logic.

Kedro-Neptune plugin lets you have all the benefits of a nicely organized kedro pipeline with a powerful user interface built for ML metadata management that lets you:

  • browse, filter, and sort your model training runs

  • compare nodes and pipelines on metrics, visual node outputs, and more

  • display all pipeline metadata including learning curves for metrics, plots, and images, rich media like video and audio or interactive visualizations from Plotly, Altair, or Bokeh

  • and do whatever else you would expect from a modern ML metadata store

Kedro-Neptune plugin supports distributed pipeline execution and works in Kedro setups that use orchestrators like Airflow or Kubeflow.

Installation

Before you start, make sure that:

Install neptune-client, kedro, and kedro-neptune

Depending on your operating system open a terminal or CMD and run this command. All required libraries are available via pip and conda:

pip
conda
pip
pip install neptune-client kedro kedro-neptune
conda
conda install -c conda-forge neptune-client kedro kedro-neptune

For more, see installing neptune-client.

This integration is tested with kedro==0.17.4, kedro-neptune==0.0.5, and neptune-client==0.10.10

Quickstart

This quickstart will show you how to:

  • Connect Neptune to your Kedro project

  • Log pipeline and dataset metadata to Neptune

  • Add explicit metadata logging to a node in your pipeline

  • Explore logged metadata in the Neptune UI.

Kedro pipeline metadata in custom dashboard in the Neptune UI

Before you start

Step 1: Create a Kedro project from "pandas-iris" starter

kedro new --starter=pandas-iris
  • Follow instructions and choose a name for your Kedro project. For example, "Great-Kedro-Project"

  • Go to your new Kedro project directory

If everything was set up correctly you should see the following directory structure:

Great-Kedro-Project # Parent directory of the template
├── conf # Project configuration files
├── data # Local project data (not committed to version control)
├── docs # Project documentation
├── logs # Project output logs (not committed to version control)
├── notebooks # Project related Jupyter notebooks (can be used for experimental code before moving the code to src)
├── README.md # Project README
├── setup.cfg # Configuration options for `pytest` when doing `kedro test` and for the `isort` utility when doing `kedro lint`
├── src # Project source code
├── pipelines
├── data_science
├── nodes.py
├── pipelines.py
└── ...

You will use nodes.py and pipelines.py files in this quickstart.

Step 2: Initialize kedro-neptune plugin

  • Go to your Kedro project directory and run

kedro neptune init

The command line will ask for your Neptune API token

  • Input your Neptune API token:

    • Press enter if it was set to the NEPTUNE_API_TOKEN environment variable

    • Pass a different environment variable to which you set your Neptune API token. For example MY_SPECIAL_NEPTUNE_TOKEN_VARIABLE

    • Pass your Neptune API token as a string

The command line will ask for your Neptune project name

  • Input your Neptune project name:

    • Press enter if it was set to the NEPTUNE_PROJECT environment variable

    • Pass a different environment variable to which you set your Neptune project name. For example MY_SPECIAL_NEPTUNE_PROJECT_VARIABLE

    • Pass your project name as a string in a format WORKSPACE/PROJECT

If everything was set up correctly you should:

  • see the message: "kedro-neptune plugin successfully configured"

  • see three new files in your kedro project:

    • Credentials file:YOUR_KEDRO_PROJECT/conf/local/credentials_neptune.yml

    • Config file:YOUR_KEDRO_PROJECT/conf/base/neptune.yml

    • Catalog file:YOUR_KEDRO_PROJECT/conf/base/neptune_catalog.yml

You can always go to those files and change the initial configuration.

Step 3: Add Neptune logging to a Kedro node

  • Go to a pipeline node src/KEDRO_PROJECT/pipelines/data_science/nodes.py

  • Import Neptune client toward the top of the nodes.py

snippet
full nodes.py
snippet
import neptune.new as neptune
full nodes.py
pipelines/data_science/nodes.py
# Copyright 2021 QuantumBlack Visual Analytics Limited
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND
# NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS
# BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
#
# The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo
# (either separately or in combination, "QuantumBlack Trademarks") are
# trademarks of QuantumBlack. The License does not grant you any right or
# license to the QuantumBlack Trademarks. You may not use the QuantumBlack
# Trademarks or any confusingly similar mark as a trademark for your product,
# or use the QuantumBlack Trademarks in any other manner that might cause
# confusion in the marketplace, including but not limited to in advertising,
# on websites, or on software.
#
# See the License for the specific language governing permissions and
# limitations under the License.
"""Example code for the nodes in the example pipeline. This code is meant
just for illustrating basic Kedro features.
Delete this when you start working on your own Kedro project.
"""
# pylint: disable=invalid-name
import logging
import matplotlib.pyplot as plt
import neptune.new as neptune
import numpy as np
import pandas as pd
from scikitplot.metrics import plot_confusion_matrix
from typing import Any, Dict
def train_model(
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
) -> np.ndarray:
"""Node for training a simple multi-class logistic regression model. The
number of training iterations as well as the learning rate are taken from
conf/project/parameters.yml. All of the data as well as the parameters
will be provided to this function at the time of execution.
"""
num_iter = parameters["example_num_train_iter"]
lr = parameters["example_learning_rate"]
X = train_x.to_numpy()
Y = train_y.to_numpy()
# Add bias to the features
bias = np.ones((X.shape[0], 1))
X = np.concatenate((bias, X), axis=1)
weights = []
# Train one model for each class in Y
for k in range(Y.shape[1]):
# Initialise weights
theta = np.zeros(X.shape[1])
y = Y[:, k]
for _ in range(num_iter):
z = np.dot(X, theta)
h = _sigmoid(z)
gradient = np.dot(X.T, (h - y)) / y.size
theta -= lr * gradient
# Save the weights for each model
weights.append(theta)
# Return a joint multi-class model with weights for all classes
return np.vstack(weights).transpose()
def predict(model: np.ndarray, test_x: pd.DataFrame) -> np.ndarray:
"""Node for making predictions given a pre-trained model and a test set."""
X = test_x.to_numpy()
# Add bias to the features
bias = np.ones((X.shape[0], 1))
X = np.concatenate((bias, X), axis=1)
# Predict "probabilities" for each class
result = _sigmoid(np.dot(X, model))
# Return the index of the class with max probability for all samples
return np.argmax(result, axis=1)
def report_accuracy(predictions: np.ndarray, test_y: pd.DataFrame,
neptune_run: neptune.run.Handler) -> None:
"""Node for reporting the accuracy of the predictions performed by the
previous node. Notice that this function has no outputs, except logging.
"""
# Get true class index
target = np.argmax(test_y.to_numpy(), axis=1)
# Calculate accuracy of predictions
accuracy = np.sum(predictions == target) / target.shape[0]
# Log the accuracy of the model
log = logging.getLogger(__name__)
log.info("Model accuracy on test set: %0.2f%%", accuracy * 100)
# Log accuracy to Neptune
neptune_run['nodes/report/accuracy'] = accuracy * 100
# Log confusion matrix to Neptune
fig, ax = plt.subplots()
plot_confusion_matrix(target, predictions, ax=ax)
neptune_run['nodes/report/confusion_matrix'].upload(fig)
def _sigmoid(z):
"""A helper sigmoid function used by the training and the scoring nodes."""
return 1 / (1 + np.exp(-z))
  • Add neptune_run argument of type neptune.run.Handler to the report_accuracy function

snippet
full nodes.py
snippet
def report_accuracy(predictions: np.ndarray, test_y: pd.DataFrame,
neptune_run: neptune.run.Handler) -> None:
...
full nodes.py
pipelines/data_science/nodes.py
# Copyright 2021 QuantumBlack Visual Analytics Limited
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND
# NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS
# BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
#
# The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo
# (either separately or in combination, "QuantumBlack Trademarks") are
# trademarks of QuantumBlack. The License does not grant you any right or
# license to the QuantumBlack Trademarks. You may not use the QuantumBlack
# Trademarks or any confusingly similar mark as a trademark for your product,
# or use the QuantumBlack Trademarks in any other manner that might cause
# confusion in the marketplace, including but not limited to in advertising,
# on websites, or on software.
#
# See the License for the specific language governing permissions and
# limitations under the License.
"""Example code for the nodes in the example pipeline. This code is meant
just for illustrating basic Kedro features.
Delete this when you start working on your own Kedro project.
"""
# pylint: disable=invalid-name
import logging
import matplotlib.pyplot as plt
import neptune.new as neptune
import numpy as np
import pandas as pd
from scikitplot.metrics import plot_confusion_matrix
from typing import Any, Dict
def train_model(
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
) -> np.ndarray:
"""Node for training a simple multi-class logistic regression model. The
number of training iterations as well as the learning rate are taken from
conf/project/parameters.yml. All of the data as well as the parameters
will be provided to this function at the time of execution.
"""
num_iter = parameters["example_num_train_iter"]
lr = parameters["example_learning_rate"]
X = train_x.to_numpy()
Y = train_y.to_numpy()
# Add bias to the features
bias = np.ones((X.shape[0], 1))
X = np.concatenate((bias, X), axis=1)
weights = []
# Train one model for each class in Y
for k in range(Y.shape[1]):
# Initialise weights
theta = np.zeros(X.shape[1])
y = Y[:, k]
for _ in range(num_iter):
z = np.dot(X, theta)
h = _sigmoid(z)
gradient = np.dot(X.T, (h - y)) / y.size
theta -= lr * gradient
# Save the weights for each model
weights.append(theta)
# Return a joint multi-class model with weights for all classes
return np.vstack(weights).transpose()
def predict(model: np.ndarray, test_x: pd.DataFrame) -> np.ndarray:
"""Node for making predictions given a pre-trained model and a test set."""
X = test_x.to_numpy()
# Add bias to the features
bias = np.ones((X.shape[0], 1))
X = np.concatenate((bias, X), axis=1)
# Predict "probabilities" for each class
result = _sigmoid(np.dot(X, model))
# Return the index of the class with max probability for all samples
return np.argmax(result, axis=1)
def report_accuracy(predictions: np.ndarray, test_y: pd.DataFrame,
neptune_run: neptune.run.Handler) -> None:
"""Node for reporting the accuracy of the predictions performed by the
previous node. Notice that this function has no outputs, except logging.
"""
# Get true class index
target = np.argmax(test_y.to_numpy(), axis=1)
# Calculate accuracy of predictions
accuracy = np.sum(predictions == target) / target.shape[0]
# Log the accuracy of the model
log = logging.getLogger(__name__)
log.info("Model accuracy on test set: %0.2f%%", accuracy * 100)
# Log accuracy to Neptune
neptune_run['nodes/report/accuracy'] = accuracy * 100
# Log confusion matrix to Neptune
fig, ax = plt.subplots()
plot_confusion_matrix(target, predictions, ax=ax)
neptune_run['nodes/report/confusion_matrix'].upload(fig)
def _sigmoid(z):
"""A helper sigmoid function used by the training and the scoring nodes."""
return 1 / (1 + np.exp(-z))

You can treat neptune_run like a normal Neptune Run and log any ML metadata to it.

Important You have to use a special string "neptune_run" to use the Neptune Run handler in Kedro pipelines.

  • Log metrics like accuracy to neptune_run

snippet
full nodes.py
snippet
def report_accuracy(predictions: np.ndarray, test_y: pd.DataFrame,
neptune_run: neptune.run.Handler) -> None:
target = np.argmax(test_y.to_numpy(), axis=1)
accuracy = np.sum(predictions == target) / target.shape[0]
neptune_run['nodes/report/accuracy'] = accuracy * 100
full nodes.py
pipelines/data_science/nodes.py
# Copyright 2021 QuantumBlack Visual Analytics Limited
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND
# NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS
# BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
#
# The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo
# (either separately or in combination, "QuantumBlack Trademarks") are
# trademarks of QuantumBlack. The License does not grant you any right or
# license to the QuantumBlack Trademarks. You may not use the QuantumBlack
# Trademarks or any confusingly similar mark as a trademark for your product,
# or use the QuantumBlack Trademarks in any other manner that might cause
# confusion in the marketplace, including but not limited to in advertising,
# on websites, or on software.
#
# See the License for the specific language governing permissions and
# limitations under the License.
"""Example code for the nodes in the example pipeline. This code is meant
just for illustrating basic Kedro features.
Delete this when you start working on your own Kedro project.
"""
# pylint: disable=invalid-name
import logging
import matplotlib.pyplot as plt
import neptune.new as neptune
import numpy as np
import pandas as pd
from scikitplot.metrics import plot_confusion_matrix
from typing import Any, Dict
def train_model(
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
) -> np.ndarray:
"""Node for training a simple multi-class logistic regression model. The
number of training iterations as well as the learning rate are taken from
conf/project/parameters.yml. All of the data as well as the parameters
will be provided to this function at the time of execution.
"""
num_iter = parameters["example_num_train_iter"]
lr = parameters["example_learning_rate"]
X = train_x.to_numpy()
Y = train_y.to_numpy()
# Add bias to the features
bias = np.ones((X.shape[0], 1))
X = np.concatenate((bias, X), axis=1)
weights = []
# Train one model for each class in Y
for k in range(Y.shape[1]):
# Initialise weights
theta = np.zeros(X.shape[1])
y = Y[:, k]
for _ in range(num_iter):
z = np.dot(X, theta)
h = _sigmoid(z)
gradient = np.dot(X.T, (h - y)) / y.size
theta -= lr * gradient
# Save the weights for each model
weights.append(theta)
# Return a joint multi-class model with weights for all classes
return np.vstack(weights).transpose()
def predict(model: np.ndarray, test_x: pd.DataFrame) -> np.ndarray:
"""Node for making predictions given a pre-trained model and a test set."""
X = test_x.to_numpy()
# Add bias to the features
bias = np.ones((X.shape[0], 1))
X = np.concatenate((bias, X), axis=1)
# Predict "probabilities" for each class
result = _sigmoid(np.dot(X, model))
# Return the index of the class with max probability for all samples
return np.argmax(result, axis=1)
def report_accuracy(predictions: np.ndarray, test_y: pd.DataFrame,
neptune_run: neptune.run.Handler) -> None:
"""Node for reporting the accuracy of the predictions performed by the
previous node. Notice that this function has no outputs, except logging.
"""
# Get true class index
target = np.argmax(test_y.to_numpy(), axis=1)
# Calculate accuracy of predictions
accuracy = np.sum(predictions == target) / target.shape[0]
# Log the accuracy of the model
log = logging.getLogger(__name__)
log.info("Model accuracy on test set: %0.2f%%", accuracy * 100)
# Log accuracy to Neptune
neptune_run['nodes/report/accuracy'] = accuracy * 100
# Log confusion matrix to Neptune
fig, ax = plt.subplots()
plot_confusion_matrix(target, predictions, ax=ax)
neptune_run['nodes/report/confusion_matrix'].upload(fig)
def _sigmoid(z):
"""A helper sigmoid function used by the training and the scoring nodes."""
return 1 / (1 + np.exp(-z))

You can log metadata from any node to any Neptune namespace you want.

  • Log images like a confusion matrix to neptune_run

snippet
full nodes.py
snippet
def report_accuracy(predictions: np.ndarray, test_y: pd.DataFrame,
neptune_run: neptune.run.Handler) -> None:
target = np.argmax(test_y.to_numpy(), axis=1)
accuracy = np.sum(predictions == target) / target.shape[0]
fig, ax = plt.subplots()
plot_confusion_matrix(target, predictions, ax=ax)
neptune_run['nodes/report/confusion_matrix'].upload(fig)
full nodes.py
pipelines/data_science/nodes.py
# Copyright 2021 QuantumBlack Visual Analytics Limited
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND
# NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS
# BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
#
# The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo
# (either separately or in combination, "QuantumBlack Trademarks") are
# trademarks of QuantumBlack. The License does not grant you any right or
# license to the QuantumBlack Trademarks. You may not use the QuantumBlack
# Trademarks or any confusingly similar mark as a trademark for your product,
# or use the QuantumBlack Trademarks in any other manner that might cause
# confusion in the marketplace, including but not limited to in advertising,
# on websites, or on software.
#
# See the License for the specific language governing permissions and
# limitations under the License.
"""Example code for the nodes in the example pipeline. This code is meant
just for illustrating basic Kedro features.
Delete this when you start working on your own Kedro project.
"""
# pylint: disable=invalid-name
import logging
import matplotlib.pyplot as plt
import neptune.new as neptune
import numpy as np
import pandas as pd
from scikitplot.metrics import plot_confusion_matrix
from typing import Any, Dict
def train_model(
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
) -> np.ndarray:
"""Node for training a simple multi-class logistic regression model. The
number of training iterations as well as the learning rate are taken from
conf/project/parameters.yml. All of the data as well as the parameters
will be provided to this function at the time of execution.
"""
num_iter = parameters["example_num_train_iter"]
lr = parameters["example_learning_rate"]
X = train_x.to_numpy()
Y = train_y.to_numpy()
# Add bias to the features
bias = np.ones((X.shape[0], 1))
X = np.concatenate((bias, X), axis=1)
weights = []
# Train one model for each class in Y
for k in range(Y.shape[1]):
# Initialise weights
theta = np.zeros(X.shape[1])
y = Y[:, k]
for _ in range(num_iter):
z = np.dot(X, theta)
h = _sigmoid(z)
gradient = np.dot(X.T, (h - y)) / y.size
theta -= lr * gradient
# Save the weights for each model
weights.append(theta)
# Return a joint multi-class model with weights for all classes
return np.vstack(weights).transpose()
def predict(model: np.ndarray, test_x: pd.DataFrame) -> np.ndarray:
"""Node for making predictions given a pre-trained model and a test set."""
X = test_x.to_numpy()
# Add bias to the features
bias = np.ones((X.shape[0], 1))
X = np.concatenate((bias, X), axis=1)
# Predict "probabilities" for each class
result = _sigmoid(np.dot(X, model))
# Return the index of the class with max probability for all samples
return np.argmax(result, axis=1)
def report_accuracy(predictions: np.ndarray, test_y: pd.DataFrame,
neptune_run: neptune.run.Handler) -> None:
"""Node for reporting the accuracy of the predictions performed by the
previous node. Notice that this function has no outputs, except logging.
"""
# Get true class index
target = np.argmax(test_y.to_numpy(), axis=1)
# Calculate accuracy of predictions
accuracy = np.sum(predictions == target) / target.shape[0]
# Log the accuracy of the model
log = logging.getLogger(__name__)
log.info("Model accuracy on test set: %0.2f%%", accuracy * 100)
# Log accuracy to Neptune
neptune_run['nodes/report/accuracy'] = accuracy * 100
# Log confusion matrix to Neptune
fig, ax = plt.subplots()
plot_confusion_matrix(target, predictions, ax=ax)
neptune_run['nodes/report/confusion_matrix'].upload(fig)
def _sigmoid(z):
"""A helper sigmoid function used by the training and the scoring nodes."""
return 1 / (1 + np.exp(-z))

Note You can log metrics, text, images, video, interactive visualizations, and more. See a full list of What you can log and display in Neptune.

Step 4: Add Neptune Run handler to the Kedro pipeline

  • Go to a pipeline definition, src/KEDRO_PROJECT/pipelines/data_science/pipelines.py

  • Add neptune_run Run handler as an input to the report node

snippet
full pipelines.py
snippet
node(
report_accuracy,
["example_predictions", "example_test_y", "neptune_run"],
None,
name="report"),
full pipelines.py
pipelines/data_science/pipelines.py
# Copyright 2021 QuantumBlack Visual Analytics Limited
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND
# NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS
# BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
#
# The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo
# (either separately or in combination, "QuantumBlack Trademarks") are
# trademarks of QuantumBlack. The License does not grant you any right or
# license to the QuantumBlack Trademarks. You may not use the QuantumBlack
# Trademarks or any confusingly similar mark as a trademark for your product,
# or use the QuantumBlack Trademarks in any other manner that might cause
# confusion in the marketplace, including but not limited to in advertising,
# on websites, or on software.
#
# See the License for the specific language governing permissions and
# limitations under the License.
"""Example code for the nodes in the example pipeline. This code is meant
just for illustrating basic Kedro features.
Delete this when you start working on your own Kedro project.
"""
from kedro.pipeline import Pipeline, node
from .nodes import predict, report_accuracy, train_model
def create_pipeline(**kwargs):
return Pipeline(
[
node(
train_model,
["example_train_x", "example_train_y", "parameters"],
"example_model",
name="train",
),
node(
predict,
dict(model="example_model", test_x="example_test_x"),
"example_predictions",
name="predict",
),
node(
report_accuracy,
["example_predictions", "example_test_y","neptune_run"],
None,
name="report",
),
]
)

Step 5: Run Kedro pipeline

Go to your console and execute your Kedro pipeline

kedro run

A link to the Neptune Run associated with the Kedro pipeline execution will be printed to the console.

Step 6: Explore results in the Neptune UI

  • Click on the Neptune Run link in your console or use an example link

https://app.neptune.ai/common/kedro-integration/e/KED-632

Default Kedro namespace in Neptune UI
  • See pipeline and node parameters in kedro/catalog/parameters

Pipeline parameters logged from Kedro to Neptune UI
  • See execution parameters in kedro/run_params

Execution parameters logged from Kedro to Neptune UI
  • See metadata about the datasets in kedro/catalog/datasets/example_iris_data

Dataset metadata logged from Kedro to Neptune UI
  • See the metrics (accuracy) you logged explicitly in the kedro/nodes/report/accuracy

Metrics logged from Kedro to Neptune UI
  • See charts (confusion matrix) you logged explicitly in the kedro/nodes/report/confusion_matrix

Confusion matrix logged from Kedro to Neptune UI

More options

Creating a dashboard with Kedro pipeline metadata

You can combine all the metadata in a single dashboard like this.

See example dashboard in Neptune

Kedro pipeline metadata in custom dashboard in the Neptune UI

Comparing Kedro pipeline runs

You can compare run metadata from your Kedro pipelines in the Neptune UI.

See example dashboard in Neptune

Comparing run metadata for Kedro pipelines in the Neptune UI

Filtering and organizing Kedro pipeline runs

You can filter and organize run metadata from your Kedro pipelines in the Neptune UI.

See example dashboard in Neptune

Filtering and organizing run metadata for Kedro pipelines in the Neptune UI

Basic logging configuration

You can configure where and how kedro pipeline metadata is logged in the conf/base/neptune.yml file.

conf/base/neptune.yml
neptune:
#GLOBAL CONFIG
project: common/kedro-integration
base_namespace: kedro
#LOGGING
upload_source_files:
- '**/*.py'
- 'conf/base/*.yml'

Specifically:

  • project: choose a Neptune project to which you want to log. You can use project: $NEPTUNE_PROJECTif you want to get it from the environment variable.

  • base_namespace: choose a base namespace (folder) where your metadata will be logged

  • upload_source_files: choose which files you want to log to Neptune

Neptune API token configuration

You can configure how kedro-neptune will look for your Neptune API token in the conf/local/credentials_neptune.yml

conf/local/credentials_neptune.yml
neptune:
api_token: eyJhcGlfYWRk123cmVqgpije5cyI6Imh0dHBzOi8v

You can:

  • pass it as a string, for example eyJhcGlfYWRk123cmVqgpije5cyI6Imh0dHBzOi8v

  • pass an environment variable, for example $MY_NEPTUNE_API_TOKEN_VARIABLE

  • leave it empty in which case Neptune will look for your API token in the $NEPTUNE_API_TOKEN environment variable

Important If you are setting api_token to an environment variable add $ before the variable name. For example $NEPTUNE_API_TOKEN

Logging files and datasets to Neptune

You can log files to Neptune with a special Kedro DataSet called kedro_neptune.NeptuneFileDataSet.

conf/base/catalog.yml
example_csv_file:
type: kedro_neptune.NeptuneFileDataSet
filepath: data/01_raw/iris.csv

To do that, add a Kedro Data Set to your catalog:

  • type: this is always kedro_neptune.NeptuneFileDataSet

  • filepath: path to a file you would like to log

You can find all the logged NeptuneFileDatasets in the kedro/catalog/files namespace in the Neptune UI. Many file types, like PNG, JSON, YML, CSV, HTML will be displayed in Neptune.

Logged Kedro NeptuneFileDataSet displayed in the Neptune UI.

If you already have a Kedro DataSet that you would log under the same name to Neptune add @neptune to the DataSet name:

conf/base/catalog.yml
example_iris_data:
type: pandas.CSVDataSet
filepath: data/01_raw/iris.csv
type: kedro_neptune.NeptuneFileDataSet
filepath: data/01_raw/iris.csv

Important You should not log the whole training dataset to Neptune but rather a small informative part of it that you would like to display later.

To log files to Neptune you can do it directly through Neptune API by using the .upload() and .upload_files() methods:

neptune_run['dataset/example_iris_data'].upload('data/01_raw/iris.csv')

See also