Display Kedro node metadata and outputs

You can log and display learning curves, diagnostic charts, images, video, and other metadata from Kedro pipelines in Neptune.

This guide shows how to:

  • Log diagnostic charts from Kedro pipeline nodes

  • Save node output as a file

  • Display charts, outputs, and other metadata from Kedro pipelines

By the end of this guide, you will log a ROC curve as image and node output as JSON from a Kedro pipeline to Neptune and display them in the Neptune UI.

See this example in Neptune

Display images and node outputs in Neptune UI

Keywords: Kedro Neptune, Display Kedro pipeline outputs, Log images from Kedro pipelines

Before you start

Make sure you meet the following prerequisites before starting:

Step 1: Add training and prediction nodes

  • Define model training parameters in conf/base/parameters.yml.

snippet
parameters.yml
snippet
# Random forest parameters
rf_max_depth: 3
rf_max_features: 3
rf_n_estimators: 25
parameters.yml
# Parameters for the example pipeline. Feel free to delete these once you
# remove the example pipeline from hooks.py and the example nodes in
# `src/pipelines/`
# Data split parameters
example_test_data_ratio: 0.2
# Random forest parameters
rf_max_depth: 3
rf_max_features: 3
rf_n_estimators: 25
# MLP parameters
mlp_alpha: 0.02
mlp_max_iter: 50
  • Create a model training node in the src/KEDRO_PROJECT/pipelines/data_science/nodes.py.

    Use parameters you defined in conf/base/parameters.yml.

    This node should output a trained model.

snippet
nodes.py
snippet
def train_rf_model(train_x: pd.DataFrame,
train_y: pd.DataFrame,
parameters: Dict[str, Any]):
max_depth = parameters["rf_max_depth"]
n_estimators = parameters["rf_n_estimators"]
max_features = parameters["rf_max_features"]
clf = RandomForestClassifier(max_depth=max_depth,
n_estimators=n_estimators,
max_features=max_features)
clf.fit(train_x, train_y.idxmax(axis=1))
return clf
nodes.py
# Copyright 2021 QuantumBlack Visual Analytics Limited
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND
# NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS
# BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
#
# The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo
# (either separately or in combination, "QuantumBlack Trademarks") are
# trademarks of QuantumBlack. The License does not grant you any right or
# license to the QuantumBlack Trademarks. You may not use the QuantumBlack
# Trademarks or any confusingly similar mark as a trademark for your product,
# or use the QuantumBlack Trademarks in any other manner that might cause
# confusion in the marketplace, including but not limited to in advertising,
# on websites, or on software.
#
# See the License for the specific language governing permissions and
# limitations under the License.
"""Example code for the nodes in the example pipeline. This code is meant
just for illustrating basic Kedro features.
Delete this when you start working on your own Kedro project.
"""
# pylint: disable=invalid-name
import logging
import matplotlib.pyplot as plt
import neptune.new as neptune
import numpy as np
import pandas as pd
from scikitplot.metrics import plot_roc_curve, plot_precision_recall_curve
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.neural_network import MLPClassifier
from typing import Any, Dict
def train_rf_model(
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
):
"""Node for training Random Forest model"""
max_depth = parameters["rf_max_depth"]
n_estimators = parameters["rf_n_estimators"]
max_features = parameters["rf_max_features"]
clf = RandomForestClassifier(max_depth=max_depth,
n_estimators=n_estimators,
max_features=max_features)
clf.fit(train_x, train_y.idxmax(axis=1))
return clf
def train_mlp_model(
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
):
"""Node for training MLP model"""
alpha = parameters["mlp_alpha"]
max_iter = parameters["mlp_max_iter"]
clf = MLPClassifier(alpha=alpha,
max_iter=max_iter)
clf.fit(train_x, train_y)
return clf
def get_predictions(rf_model: RandomForestClassifier, mlp_model: MLPClassifier,
test_x: pd.DataFrame) -> Dict[str, Any]:
"""Node for making predictions given a pre-trained model and a test set."""
predictions = {}
for name, model in zip(['rf', 'mlp'], [rf_model, mlp_model]):
y_pred = model.predict_proba(test_x).tolist()
predictions[name] = y_pred
return predictions
def evaluate_models(predictions: dict, test_y: pd.DataFrame,
neptune_run: neptune.run.Handler):
"""Node for evaluating Random Forest and MLP models and creating ROC and Precision-Recall Curves"""
for name, y_pred in predictions.items():
y_true = test_y.to_numpy().argmax(axis=1)
y_pred = np.array(y_pred)
accuracy = accuracy_score(y_true, y_pred.argmax(axis=1).ravel())
neptune_run[f'nodes/evaluate_models/metrics/accuracy_{name}'] = accuracy
fig, ax = plt.subplots()
plot_roc_curve(test_y.idxmax(axis=1), y_pred, ax=ax, title=f'ROC curve {name}')
neptune_run['nodes/evaluate_models/plots/plot_roc_curve'].log(fig)
fig, ax = plt.subplots()
plot_precision_recall_curve(test_y.idxmax(axis=1), y_pred, ax=ax, title=f'PR curve {name}')
neptune_run['nodes/evaluate_models/plots/plot_precision_recall_curve'].log(fig)
def ensemble_models(predictions: dict, test_y: pd.DataFrame,
neptune_run: neptune.run.Handler) -> np.ndarray:
"""Node for averaging predictions of Random Forest and MLP models"""
y_true = test_y.to_numpy().argmax(axis=1)
y_pred_averaged = np.stack(predictions.values()).mean(axis=0)
accuracy = accuracy_score(y_true, y_pred_averaged.argmax(axis=1).ravel())
neptune_run[f'nodes/ensemble_models/metrics/accuracy_ensemble'] = accuracy

In this example, you will create a Kedro pipeline that trains and ensembles predictions from two models Random Forest and MLPClassifier.

For simplicity, we showed just the Random Forest code snippets below. See the full nodes.py for the MLPClassifier.

  • Create a model prediction node in the src/KEDRO_PROJECT/pipelines/data_science/nodes.py. This node should output a dictionary with predictions for two models Random Forest and MLPClassifier.

snippet
nodes.py
snippet
def get_predictions(rf_model: RandomForestClassifier,
mlp_model: MLPClassifier,
test_x: pd.DataFrame):
"""Node for making predictions given a pre-trained model and a test set."""
predictions = {}
for name, model in zip(['rf', 'mlp'], [rf_model, mlp_model]):
y_pred = model.predict_proba(test_x).tolist()
predictions[name] = y_pred
return predictions
nodes.py
# Copyright 2021 QuantumBlack Visual Analytics Limited
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND
# NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS
# BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
#
# The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo
# (either separately or in combination, "QuantumBlack Trademarks") are
# trademarks of QuantumBlack. The License does not grant you any right or
# license to the QuantumBlack Trademarks. You may not use the QuantumBlack
# Trademarks or any confusingly similar mark as a trademark for your product,
# or use the QuantumBlack Trademarks in any other manner that might cause
# confusion in the marketplace, including but not limited to in advertising,
# on websites, or on software.
#
# See the License for the specific language governing permissions and
# limitations under the License.
"""Example code for the nodes in the example pipeline. This code is meant
just for illustrating basic Kedro features.
Delete this when you start working on your own Kedro project.
"""
# pylint: disable=invalid-name
import logging
import matplotlib.pyplot as plt
import neptune.new as neptune
import numpy as np
import pandas as pd
from scikitplot.metrics import plot_roc_curve, plot_precision_recall_curve
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.neural_network import MLPClassifier
from typing import Any, Dict
def train_rf_model(
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
):
"""Node for training Random Forest model"""
max_depth = parameters["rf_max_depth"]
n_estimators = parameters["rf_n_estimators"]
max_features = parameters["rf_max_features"]
clf = RandomForestClassifier(max_depth=max_depth,
n_estimators=n_estimators,
max_features=max_features)
clf.fit(train_x, train_y.idxmax(axis=1))
return clf
def train_mlp_model(
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
):
"""Node for training MLP model"""
alpha = parameters["mlp_alpha"]
max_iter = parameters["mlp_max_iter"]
clf = MLPClassifier(alpha=alpha,
max_iter=max_iter)
clf.fit(train_x, train_y)
return clf
def get_predictions(rf_model: RandomForestClassifier, mlp_model: MLPClassifier,
test_x: pd.DataFrame) -> Dict[str, Any]:
"""Node for making predictions given a pre-trained model and a test set."""
predictions = {}
for name, model in zip(['rf', 'mlp'], [rf_model, mlp_model]):
y_pred = model.predict_proba(test_x).tolist()
predictions[name] = y_pred
return predictions
def evaluate_models(predictions: dict, test_y: pd.DataFrame,
neptune_run: neptune.run.Handler):
"""Node for evaluating Random Forest and MLP models and creating ROC and Precision-Recall Curves"""
for name, y_pred in predictions.items():
y_true = test_y.to_numpy().argmax(axis=1)
y_pred = np.array(y_pred)
accuracy = accuracy_score(y_true, y_pred.argmax(axis=1).ravel())
neptune_run[f'nodes/evaluate_models/metrics/accuracy_{name}'] = accuracy
fig, ax = plt.subplots()
plot_roc_curve(test_y.idxmax(axis=1), y_pred, ax=ax, title=f'ROC curve {name}')
neptune_run['nodes/evaluate_models/plots/plot_roc_curve'].log(fig)
fig, ax = plt.subplots()
plot_precision_recall_curve(test_y.idxmax(axis=1), y_pred, ax=ax, title=f'PR curve {name}')
neptune_run['nodes/evaluate_models/plots/plot_precision_recall_curve'].log(fig)
def ensemble_models(predictions: dict, test_y: pd.DataFrame,
neptune_run: neptune.run.Handler) -> np.ndarray:
"""Node for averaging predictions of Random Forest and MLP models"""
y_true = test_y.to_numpy().argmax(axis=1)
y_pred_averaged = np.stack(predictions.values()).mean(axis=0)
accuracy = accuracy_score(y_true, y_pred_averaged.argmax(axis=1).ravel())
neptune_run[f'nodes/ensemble_models/metrics/accuracy_ensemble'] = accuracy

Step 2: Add evaluation node and log ROC curve to Neptune

  • Import Neptune client toward the top of the nodes.py

snippet
nodes.py
snippet
import neptune.new as neptune
nodes.py
# Copyright 2021 QuantumBlack Visual Analytics Limited
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND
# NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS
# BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
#
# The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo
# (either separately or in combination, "QuantumBlack Trademarks") are
# trademarks of QuantumBlack. The License does not grant you any right or
# license to the QuantumBlack Trademarks. You may not use the QuantumBlack
# Trademarks or any confusingly similar mark as a trademark for your product,
# or use the QuantumBlack Trademarks in any other manner that might cause
# confusion in the marketplace, including but not limited to in advertising,
# on websites, or on software.
#
# See the License for the specific language governing permissions and
# limitations under the License.
"""Example code for the nodes in the example pipeline. This code is meant
just for illustrating basic Kedro features.
Delete this when you start working on your own Kedro project.
"""
# pylint: disable=invalid-name
import logging
import matplotlib.pyplot as plt
import neptune.new as neptune
import numpy as np
import pandas as pd
from scikitplot.metrics import plot_roc_curve, plot_precision_recall_curve
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.neural_network import MLPClassifier
from typing import Any, Dict
def train_rf_model(
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
):
"""Node for training Random Forest model"""
max_depth = parameters["rf_max_depth"]
n_estimators = parameters["rf_n_estimators"]
max_features = parameters["rf_max_features"]
clf = RandomForestClassifier(max_depth=max_depth,
n_estimators=n_estimators,
max_features=max_features)
clf.fit(train_x, train_y.idxmax(axis=1))
return clf
def train_mlp_model(
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
):
"""Node for training MLP model"""
alpha = parameters["mlp_alpha"]
max_iter = parameters["mlp_max_iter"]
clf = MLPClassifier(alpha=alpha,
max_iter=max_iter)
clf.fit(train_x, train_y)
return clf
def get_predictions(rf_model: RandomForestClassifier, mlp_model: MLPClassifier,
test_x: pd.DataFrame) -> Dict[str, Any]:
"""Node for making predictions given a pre-trained model and a test set."""
predictions = {}
for name, model in zip(['rf', 'mlp'], [rf_model, mlp_model]):
y_pred = model.predict_proba(test_x).tolist()
predictions[name] = y_pred
return predictions
def evaluate_models(predictions: dict, test_y: pd.DataFrame,
neptune_run: neptune.run.Handler):
"""Node for evaluating Random Forest and MLP models and creating ROC and Precision-Recall Curves"""
for name, y_pred in predictions.items():
y_true = test_y.to_numpy().argmax(axis=1)
y_pred = np.array(y_pred)
accuracy = accuracy_score(y_true, y_pred.argmax(axis=1).ravel())
neptune_run[f'nodes/evaluate_models/metrics/accuracy_{name}'] = accuracy
fig, ax = plt.subplots()
plot_roc_curve(test_y.idxmax(axis=1), y_pred, ax=ax, title=f'ROC curve {name}')
neptune_run['nodes/evaluate_models/plots/plot_roc_curve'].log(fig)
fig, ax = plt.subplots()
plot_precision_recall_curve(test_y.idxmax(axis=1), y_pred, ax=ax, title=f'PR curve {name}')
neptune_run['nodes/evaluate_models/plots/plot_precision_recall_curve'].log(fig)
def ensemble_models(predictions: dict, test_y: pd.DataFrame,
neptune_run: neptune.run.Handler) -> np.ndarray:
"""Node for averaging predictions of Random Forest and MLP models"""
y_true = test_y.to_numpy().argmax(axis=1)
y_pred_averaged = np.stack(predictions.values()).mean(axis=0)
accuracy = accuracy_score(y_true, y_pred_averaged.argmax(axis=1).ravel())
neptune_run[f'nodes/ensemble_models/metrics/accuracy_ensemble'] = accuracy
  • Add neptune_run argument of type neptune.run.Handler to the report_accuracy function

snippet
nodes.py
snippet
def evaluate_models(predictions: dict, test_y: pd.DataFrame,
neptune_run: neptune.run.Handler):
...
nodes.py
# Copyright 2021 QuantumBlack Visual Analytics Limited
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND
# NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS
# BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
#
# The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo
# (either separately or in combination, "QuantumBlack Trademarks") are
# trademarks of QuantumBlack. The License does not grant you any right or
# license to the QuantumBlack Trademarks. You may not use the QuantumBlack
# Trademarks or any confusingly similar mark as a trademark for your product,
# or use the QuantumBlack Trademarks in any other manner that might cause
# confusion in the marketplace, including but not limited to in advertising,
# on websites, or on software.
#
# See the License for the specific language governing permissions and
# limitations under the License.
"""Example code for the nodes in the example pipeline. This code is meant
just for illustrating basic Kedro features.
Delete this when you start working on your own Kedro project.
"""
# pylint: disable=invalid-name
import logging
import matplotlib.pyplot as plt
import neptune.new as neptune
import numpy as np
import pandas as pd
from scikitplot.metrics import plot_roc_curve, plot_precision_recall_curve
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.neural_network import MLPClassifier
from typing import Any, Dict
def train_rf_model(
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
):
"""Node for training Random Forest model"""
max_depth = parameters["rf_max_depth"]
n_estimators = parameters["rf_n_estimators"]
max_features = parameters["rf_max_features"]
clf = RandomForestClassifier(max_depth=max_depth,
n_estimators=n_estimators,
max_features=max_features)
clf.fit(train_x, train_y.idxmax(axis=1))
return clf
def train_mlp_model(
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
):
"""Node for training MLP model"""
alpha = parameters["mlp_alpha"]
max_iter = parameters["mlp_max_iter"]
clf = MLPClassifier(alpha=alpha,
max_iter=max_iter)
clf.fit(train_x, train_y)
return clf
def get_predictions(rf_model: RandomForestClassifier, mlp_model: MLPClassifier,
test_x: pd.DataFrame) -> Dict[str, Any]:
"""Node for making predictions given a pre-trained model and a test set."""
predictions = {}
for name, model in zip(['rf', 'mlp'], [rf_model, mlp_model]):
y_pred = model.predict_proba(test_x).tolist()
predictions[name] = y_pred
return predictions
def evaluate_models(predictions: dict, test_y: pd.DataFrame,
neptune_run: neptune.run.Handler):
"""Node for evaluating Random Forest and MLP models and creating ROC and Precision-Recall Curves"""
for name, y_pred in predictions.items():
y_true = test_y.to_numpy().argmax(axis=1)
y_pred = np.array(y_pred)
accuracy = accuracy_score(y_true, y_pred.argmax(axis=1).ravel())
neptune_run[f'nodes/evaluate_models/metrics/accuracy_{name}'] = accuracy
fig, ax = plt.subplots()
plot_roc_curve(test_y.idxmax(axis=1), y_pred, ax=ax, title=f'ROC curve {name}')
neptune_run['nodes/evaluate_models/plots/plot_roc_curve'].log(fig)
fig, ax = plt.subplots()
plot_precision_recall_curve(test_y.idxmax(axis=1), y_pred, ax=ax, title=f'PR curve {name}')
neptune_run['nodes/evaluate_models/plots/plot_precision_recall_curve'].log(fig)
def ensemble_models(predictions: dict, test_y: pd.DataFrame,
neptune_run: neptune.run.Handler) -> np.ndarray:
"""Node for averaging predictions of Random Forest and MLP models"""
y_true = test_y.to_numpy().argmax(axis=1)
y_pred_averaged = np.stack(predictions.values()).mean(axis=0)
accuracy = accuracy_score(y_true, y_pred_averaged.argmax(axis=1).ravel())
neptune_run[f'nodes/ensemble_models/metrics/accuracy_ensemble'] = accuracy

You can treat neptune_run like a normal Neptune Run and log any ML metadata to it.

You have to use a special string "neptune_run" to use the Neptune Run handler in Kedro pipelines.

  • Create the ROC curve as matplotlib figure and log it to 'nodes/evaluate_models/plots/plot_roc_curve' namespace with the .log() method.

snippet
nodes.py
snippet
def evaluate_models(predictions: dict, test_y: pd.DataFrame,
neptune_run: neptune.run.Handler):
...
for name, y_pred in predictions.items():
y_true = test_y.to_numpy().argmax(axis=1)
y_pred = np.array(y_pred)
fig, ax = plt.subplots()
plot_roc_curve(test_y.idxmax(axis=1), y_pred,
ax=ax, title=f'ROC curve {name}')
neptune_run['nodes/evaluate_models/plots/plot_roc_curve'].log(fig)
nodes.py
# Copyright 2021 QuantumBlack Visual Analytics Limited
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND
# NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS
# BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
#
# The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo
# (either separately or in combination, "QuantumBlack Trademarks") are
# trademarks of QuantumBlack. The License does not grant you any right or
# license to the QuantumBlack Trademarks. You may not use the QuantumBlack
# Trademarks or any confusingly similar mark as a trademark for your product,
# or use the QuantumBlack Trademarks in any other manner that might cause
# confusion in the marketplace, including but not limited to in advertising,
# on websites, or on software.
#
# See the License for the specific language governing permissions and
# limitations under the License.
"""Example code for the nodes in the example pipeline. This code is meant
just for illustrating basic Kedro features.
Delete this when you start working on your own Kedro project.
"""
# pylint: disable=invalid-name
import logging
import matplotlib.pyplot as plt
import neptune.new as neptune
import numpy as np
import pandas as pd
from scikitplot.metrics import plot_roc_curve, plot_precision_recall_curve
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.neural_network import MLPClassifier
from typing import Any, Dict
def train_rf_model(
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
):
"""Node for training Random Forest model"""
max_depth = parameters["rf_max_depth"]
n_estimators = parameters["rf_n_estimators"]
max_features = parameters["rf_max_features"]
clf = RandomForestClassifier(max_depth=max_depth,
n_estimators=n_estimators,
max_features=max_features)
clf.fit(train_x, train_y.idxmax(axis=1))
return clf
def train_mlp_model(
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
):
"""Node for training MLP model"""
alpha = parameters["mlp_alpha"]
max_iter = parameters["mlp_max_iter"]
clf = MLPClassifier(alpha=alpha,
max_iter=max_iter)
clf.fit(train_x, train_y)
return clf
def get_predictions(rf_model: RandomForestClassifier, mlp_model: MLPClassifier,
test_x: pd.DataFrame) -> Dict[str, Any]:
"""Node for making predictions given a pre-trained model and a test set."""
predictions = {}
for name, model in zip(['rf', 'mlp'], [rf_model, mlp_model]):
y_pred = model.predict_proba(test_x).tolist()
predictions[name] = y_pred
return predictions
def evaluate_models(predictions: dict, test_y: pd.DataFrame,
neptune_run: neptune.run.Handler):
"""Node for evaluating Random Forest and MLP models and creating ROC and Precision-Recall Curves"""
for name, y_pred in predictions.items():
y_true = test_y.to_numpy().argmax(axis=1)
y_pred = np.array(y_pred)
accuracy = accuracy_score(y_true, y_pred.argmax(axis=1).ravel())
neptune_run[f'nodes/evaluate_models/metrics/accuracy_{name}'] = accuracy
fig, ax = plt.subplots()
plot_roc_curve(test_y.idxmax(axis=1), y_pred, ax=ax, title=f'ROC curve {name}')
neptune_run['nodes/evaluate_models/plots/plot_roc_curve'].log(fig)
fig, ax = plt.subplots()
plot_precision_recall_curve(test_y.idxmax(axis=1), y_pred, ax=ax, title=f'PR curve {name}')
neptune_run['nodes/evaluate_models/plots/plot_precision_recall_curve'].log(fig)
def ensemble_models(predictions: dict, test_y: pd.DataFrame,
neptune_run: neptune.run.Handler) -> np.ndarray:
"""Node for averaging predictions of Random Forest and MLP models"""
y_true = test_y.to_numpy().argmax(axis=1)
y_pred_averaged = np.stack(predictions.values()).mean(axis=0)
accuracy = accuracy_score(y_true, y_pred_averaged.argmax(axis=1).ravel())
neptune_run[f'nodes/ensemble_models/metrics/accuracy_ensemble'] = accuracy

Step 3: Save node output as JSON file in Neptune

  • Add predictions dataset to the Kedro catalog in 'conf/base/catalog.yml':

snippet
catalog.yml
snippet
predictions:
type: kedro.extras.datasets.json.JSONDataSet
filepath: data/07_model_output/predictions.json
catalog.yml
# Here you can define all your data sets by using simple YAML syntax.
#
# Documentation for this file format can be found in "The Data Catalog"
# Link: https://kedro.readthedocs.io/en/stable/05_data/01_data_catalog.html
#
# We support interacting with a variety of data stores including local file systems, cloud, network and HDFS
#
# An example data set definition can look as follows:
#
#bikes:
# type: pandas.CSVDataSet
# filepath: "data/01_raw/bikes.csv"
#
#weather:
# type: spark.SparkDataSet
# filepath: s3a://your_bucket/data/01_raw/weather*
# file_format: csv
# credentials: dev_s3
# load_args:
# header: True
# inferSchema: True
# save_args:
# sep: '|'
# header: True
#
#scooters:
# type: pandas.SQLTableDataSet
# credentials: scooters_credentials
# table_name: scooters
# load_args:
# index_col: ['name']
# columns: ['name', 'gear']
# save_args:
# if_exists: 'replace'
# # if_exists: 'fail'
# # if_exists: 'append'
#
# The Data Catalog supports being able to reference the same file using two different DataSet implementations
# (transcoding), templating and a way to reuse arguments that are frequently repeated. See more here:
# https://kedro.readthedocs.io/en/stable/05_data/01_data_catalog.html
#
# This is a data set used by the "Hello World" example pipeline provided with the project
# template. Please feel free to remove it once you remove the example pipeline.
example_iris_data:
type: pandas.CSVDataSet
filepath: data/01_raw/iris_v2.csv
rf_model:
type: kedro.extras.datasets.pickle.PickleDataSet
filepath: data/06_models/rf_model.pkl
mlp_model:
type: kedro.extras.datasets.pickle.PickleDataSet
filepath: data/06_models/mlp_model.pkl
predictions:
type: kedro.extras.datasets.json.JSONDataSet
filepath: data/07_model_output/predictions.json
type: kedro_neptune.NeptuneFileDataSet
filepath: data/07_model_output/predictions.json
snippet
catalog.yml
snippet
type: kedro_neptune.NeptuneFileDataSet
filepath: data/07_model_output/predictions.json
catalog.yml
# Here you can define all your data sets by using simple YAML syntax.
#
# Documentation for this file format can be found in "The Data Catalog"
# Link: https://kedro.readthedocs.io/en/stable/05_data/01_data_catalog.html
#
# We support interacting with a variety of data stores including local file systems, cloud, network and HDFS
#
# An example data set definition can look as follows:
#
#bikes:
# type: pandas.CSVDataSet
# filepath: "data/01_raw/bikes.csv"
#
#weather:
# type: spark.SparkDataSet
# filepath: s3a://your_bucket/data/01_raw/weather*
# file_format: csv
# credentials: dev_s3
# load_args:
# header: True
# inferSchema: True
# save_args:
# sep: '|'
# header: True
#
#scooters:
# type: pandas.SQLTableDataSet
# credentials: scooters_credentials
# table_name: scooters
# load_args:
# index_col: ['name']
# columns: ['name', 'gear']
# save_args:
# if_exists: 'replace'
# # if_exists: 'fail'
# # if_exists: 'append'
#
# The Data Catalog supports being able to reference the same file using two different DataSet implementations
# (transcoding), templating and a way to reuse arguments that are frequently repeated. See more here:
# https://kedro.readthedocs.io/en/stable/05_data/01_data_catalog.html
#
# This is a data set used by the "Hello World" example pipeline provided with the project
# template. Please feel free to remove it once you remove the example pipeline.
example_iris_data:
type: pandas.CSVDataSet
filepath: data/01_raw/iris_v2.csv
rf_model:
type: kedro.extras.datasets.pickle.PickleDataSet
filepath: data/06_models/rf_model.pkl
mlp_model:
type: kedro.extras.datasets.pickle.PickleDataSet
filepath: data/06_models/mlp_model.pkl
predictions:
type: kedro.extras.datasets.json.JSONDataSet
filepath: data/07_model_output/predictions.json
type: kedro_neptune.NeptuneFileDataSet
filepath: data/07_model_output/predictions.json

You can log any file format to Neptune not just JSON with kedro_neptune.NeptuneFileDataSet.

Step 4: Add Neptune Run handler to the Kedro pipeline

  • Go to a pipeline definition, src/KEDRO_PROJECT/pipelines/data_science/pipelines.py

  • Add neptune_run Run handler as an input to the evaluate_models node

snippet
pipelines.py
snippet
node(
evaluate_models,
dict(predictions="predictions",
test_y="example_test_y",
neptune_run="neptune_run"),
None,
name="evaluate_models",
),
pipelines.py
pipelines/data_science/pipelines.py
# Copyright 2021 QuantumBlack Visual Analytics Limited
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND
# NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS
# BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
#
# The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo
# (either separately or in combination, "QuantumBlack Trademarks") are
# trademarks of QuantumBlack. The License does not grant you any right or
# license to the QuantumBlack Trademarks. You may not use the QuantumBlack
# Trademarks or any confusingly similar mark as a trademark for your product,
# or use the QuantumBlack Trademarks in any other manner that might cause
# confusion in the marketplace, including but not limited to in advertising,
# on websites, or on software.
#
# See the License for the specific language governing permissions and
# limitations under the License.
"""Example code for the nodes in the example pipeline. This code is meant
just for illustrating basic Kedro features.
Delete this when you start working on your own Kedro project.
"""
from kedro.pipeline import Pipeline, node
from .nodes import predict, report_accuracy, train_model
def create_pipeline(**kwargs):
return Pipeline(
[
node(
train_model,
["example_train_x", "example_train_y", "parameters"],
"example_model",
name="train",
),
node(
predict,
dict(model="example_model", test_x="example_test_x"),
"example_predictions",
name="predict",
),
node(
report_accuracy,
["example_predictions", "example_test_y","neptune_run"],
None,
name="report",
),
]
)

Step 5: Run Kedro pipeline

Go to your console and execute your Kedro pipeline

kedro run

Step 6: Display ROC curve and node output in the Neptune UI

  • Click on the Neptune Run link in your console or use an example link

https://app.neptune.ai/o/common/org/kedro-integration/e/KED-676

  • Go to the 'kedro/nodes/evaluate_models/plots/plot_roc_curve' namespace to see your ROC curves displayed:

ROC curves logged from Kedro node to the Neptune UI
  • Click on one of the ROC curves to see and scroll through images in large display

Scroll through ROC curve images in the Neptune UI.
  • Go to the 'kedro/catalog/files' namespace to see your ROC curves displayed:

score = train_model(PARAMS, TRAIN_DATASET_PATH, TEST_DATASET_PATH)
run["metrics/test_score"] = score
  • You can create a dashboard that combines ROC curves and JSON file node output in one view:

See this example in Neptune

Display images and node outputs in Neptune UI

Summary

In this guide you learned:

  • How to log charts as images from Kedro pipelines

  • How to save node outputs as files

  • How to display charts and files in the Neptune UI

See also