Compare results between Kedro nodes
You can log, monitor, and compare metrics, parameters, plots, and other metadata from Kedro pipeline nodes in Neptune.
This guide shows how to:
    Log metadata from model evaluations that happen in multiple nodes in a Kedro pipeline
    How to compare metrics in a Runs Table
    How to compare ROC curves and Precision-Recall curves in the Neptune UI
By the end of this guide, you will log metadata from a few Kedro pipeline executions and compare models trained in different nodes on accuracy and diagnostic charts in the Neptune UI.
Kedro nodes compared on model accuracy and ROC curves in the Neptune UI.
Customized Runs table for comparing Kedro node results between pipeline executions.
Keywords: Kedro Neptune, Compare Kedro pipelines, Compare Kedro nodes

Before you start

Make sure you meet the following prerequisites before starting:

Step 1: Add model training and prediction nodes

    Define model training parameters in conf/base/parameters.yml. Once defined in 'parameters.yml' parameters will be logged to Neptune automatically.
snippet
parameters.yml
1
# Random forest parameters
2
rf_max_depth: 3
3
rf_max_features: 3
4
rf_n_estimators: 25
Copied!
1
# Parameters for the example pipeline. Feel free to delete these once you
2
# remove the example pipeline from hooks.py and the example nodes in
3
# `src/pipelines/`
4
5
# Data split parameters
6
example_test_data_ratio: 0.2
7
8
# Random forest parameters
9
rf_max_depth: 3
10
rf_max_features: 3
11
rf_n_estimators: 25
12
13
# MLP parameters
14
mlp_alpha: 0.02
15
mlp_max_iter: 50
Copied!
    Create a model training node in the src/KEDRO_PROJECT/pipelines/data_science/nodes.py.
    Use parameters you defined in conf/base/parameters.yml.
    This node should output a trained model.
snippet
nodes.py
1
def train_rf_model(train_x: pd.DataFrame,
2
train_y: pd.DataFrame,
3
parameters: Dict[str, Any]):
4
5
max_depth = parameters["rf_max_depth"]
6
n_estimators = parameters["rf_n_estimators"]
7
max_features = parameters["rf_max_features"]
8
9
clf = RandomForestClassifier(max_depth=max_depth,
10
n_estimators=n_estimators,
11
max_features=max_features)
12
clf.fit(train_x, train_y.idxmax(axis=1))
13
14
return clf
Copied!
1
# Copyright 2021 QuantumBlack Visual Analytics Limited
2
#
3
# Licensed under the Apache License, Version 2.0 (the "License");
4
# you may not use this file except in compliance with the License.
5
# You may obtain a copy of the License at
6
#
7
# http://www.apache.org/licenses/LICENSE-2.0
8
#
9
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
10
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
11
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND
12
# NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS
13
# BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN
14
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN
15
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
16
#
17
# The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo
18
# (either separately or in combination, "QuantumBlack Trademarks") are
19
# trademarks of QuantumBlack. The License does not grant you any right or
20
# license to the QuantumBlack Trademarks. You may not use the QuantumBlack
21
# Trademarks or any confusingly similar mark as a trademark for your product,
22
# or use the QuantumBlack Trademarks in any other manner that might cause
23
# confusion in the marketplace, including but not limited to in advertising,
24
# on websites, or on software.
25
#
26
# See the License for the specific language governing permissions and
27
# limitations under the License.
28
29
"""Example code for the nodes in the example pipeline. This code is meant
30
just for illustrating basic Kedro features.
31
32
Delete this when you start working on your own Kedro project.
33
"""
34
# pylint: disable=invalid-name
35
36
import logging
37
import matplotlib.pyplot as plt
38
import neptune.new as neptune
39
import numpy as np
40
import pandas as pd
41
from scikitplot.metrics import plot_roc_curve, plot_precision_recall_curve
42
from sklearn.ensemble import RandomForestClassifier
43
from sklearn.metrics import accuracy_score
44
from sklearn.neural_network import MLPClassifier
45
from typing import Any, Dict
46
47
48
def train_rf_model(
49
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
50
):
51
"""Node for training Random Forest model"""
52
max_depth = parameters["rf_max_depth"]
53
n_estimators = parameters["rf_n_estimators"]
54
max_features = parameters["rf_max_features"]
55
56
clf = RandomForestClassifier(max_depth=max_depth,
57
n_estimators=n_estimators,
58
max_features=max_features)
59
clf.fit(train_x, train_y.idxmax(axis=1))
60
61
return clf
62
63
64
def train_mlp_model(
65
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
66
):
67
"""Node for training MLP model"""
68
alpha = parameters["mlp_alpha"]
69
max_iter = parameters["mlp_max_iter"]
70
71
clf = MLPClassifier(alpha=alpha,
72
max_iter=max_iter)
73
clf.fit(train_x, train_y)
74
75
return clf
76
77
78
def get_predictions(rf_model: RandomForestClassifier, mlp_model: MLPClassifier,
79
test_x: pd.DataFrame) -> Dict[str, Any]:
80
"""Node for making predictions given a pre-trained model and a test set."""
81
predictions = {}
82
for name, model in zip(['rf', 'mlp'], [rf_model, mlp_model]):
83
y_pred = model.predict_proba(test_x).tolist()
84
predictions[name] = y_pred
85
86
return predictions
87
88
89
def evaluate_models(predictions: dict, test_y: pd.DataFrame,
90
neptune_run: neptune.run.Handler):
91
"""Node for evaluating Random Forest and MLP models and creating ROC and Precision-Recall Curves"""
92
93
for name, y_pred in predictions.items():
94
y_true = test_y.to_numpy().argmax(axis=1)
95
y_pred = np.array(y_pred)
96
97
accuracy = accuracy_score(y_true, y_pred.argmax(axis=1).ravel())
98
neptune_run[f'nodes/evaluate_models/metrics/accuracy_{name}'] = accuracy
99
100
fig, ax = plt.subplots()
101
plot_roc_curve(test_y.idxmax(axis=1), y_pred, ax=ax, title=f'ROC curve {name}')
102
neptune_run['nodes/evaluate_models/plots/plot_roc_curve'].log(fig)
103
104
fig, ax = plt.subplots()
105
plot_precision_recall_curve(test_y.idxmax(axis=1), y_pred, ax=ax, title=f'PR curve {name}')
106
neptune_run['nodes/evaluate_models/plots/plot_precision_recall_curve'].log(fig)
107
108
109
def ensemble_models(predictions: dict, test_y: pd.DataFrame,
110
neptune_run: neptune.run.Handler) -> np.ndarray:
111
"""Node for averaging predictions of Random Forest and MLP models"""
112
y_true = test_y.to_numpy().argmax(axis=1)
113
y_pred_averaged = np.stack(predictions.values()).mean(axis=0)
114
115
accuracy = accuracy_score(y_true, y_pred_averaged.argmax(axis=1).ravel())
116
neptune_run[f'nodes/ensemble_models/metrics/accuracy_ensemble'] = accuracy
117
Copied!
In this example, you will create a Kedro pipeline that trains and ensembles predictions from two models Random Forest and MLPClassifier.
For simplicity, we showed just the Random Forest code snippets below. See the full nodes.py for the MLPClassifier.
    Create a model prediction node in the src/KEDRO_PROJECT/pipelines/data_science/nodes.py. This node should output a dictionary with predictions for two models Random Forest and MLPClassifier.
snippet
nodes.py
1
def get_predictions(rf_model: RandomForestClassifier,
2
mlp_model: MLPClassifier,
3
test_x: pd.DataFrame):
4
"""Node for making predictions given a pre-trained model and a test set."""
5
predictions = {}
6
for name, model in zip(['rf', 'mlp'], [rf_model, mlp_model]):
7
y_pred = model.predict_proba(test_x).tolist()
8
predictions[name] = y_pred
9
10
return predictions
Copied!
1
# Copyright 2021 QuantumBlack Visual Analytics Limited
2
#
3
# Licensed under the Apache License, Version 2.0 (the "License");
4
# you may not use this file except in compliance with the License.
5
# You may obtain a copy of the License at
6
#
7
# http://www.apache.org/licenses/LICENSE-2.0
8
#
9
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
10
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
11
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND
12
# NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS
13
# BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN
14
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN
15
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
16
#
17
# The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo
18
# (either separately or in combination, "QuantumBlack Trademarks") are
19
# trademarks of QuantumBlack. The License does not grant you any right or
20
# license to the QuantumBlack Trademarks. You may not use the QuantumBlack
21
# Trademarks or any confusingly similar mark as a trademark for your product,
22
# or use the QuantumBlack Trademarks in any other manner that might cause
23
# confusion in the marketplace, including but not limited to in advertising,
24
# on websites, or on software.
25
#
26
# See the License for the specific language governing permissions and
27
# limitations under the License.
28
29
"""Example code for the nodes in the example pipeline. This code is meant
30
just for illustrating basic Kedro features.
31
32
Delete this when you start working on your own Kedro project.
33
"""
34
# pylint: disable=invalid-name
35
36
import logging
37
import matplotlib.pyplot as plt
38
import neptune.new as neptune
39
import numpy as np
40
import pandas as pd
41
from scikitplot.metrics import plot_roc_curve, plot_precision_recall_curve
42
from sklearn.ensemble import RandomForestClassifier
43
from sklearn.metrics import accuracy_score
44
from sklearn.neural_network import MLPClassifier
45
from typing import Any, Dict
46
47
48
def train_rf_model(
49
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
50
):
51
"""Node for training Random Forest model"""
52
max_depth = parameters["rf_max_depth"]
53
n_estimators = parameters["rf_n_estimators"]
54
max_features = parameters["rf_max_features"]
55
56
clf = RandomForestClassifier(max_depth=max_depth,
57
n_estimators=n_estimators,
58
max_features=max_features)
59
clf.fit(train_x, train_y.idxmax(axis=1))
60
61
return clf
62
63
64
def train_mlp_model(
65
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
66
):
67
"""Node for training MLP model"""
68
alpha = parameters["mlp_alpha"]
69
max_iter = parameters["mlp_max_iter"]
70
71
clf = MLPClassifier(alpha=alpha,
72
max_iter=max_iter)
73
clf.fit(train_x, train_y)
74
75
return clf
76
77
78
def get_predictions(rf_model: RandomForestClassifier, mlp_model: MLPClassifier,
79
test_x: pd.DataFrame) -> Dict[str, Any]:
80
"""Node for making predictions given a pre-trained model and a test set."""
81
predictions = {}
82
for name, model in zip(['rf', 'mlp'], [rf_model, mlp_model]):
83
y_pred = model.predict_proba(test_x).tolist()
84
predictions[name] = y_pred
85
86
return predictions
87
88
89
def evaluate_models(predictions: dict, test_y: pd.DataFrame,
90
neptune_run: neptune.run.Handler):
91
"""Node for evaluating Random Forest and MLP models and creating ROC and Precision-Recall Curves"""
92
93
for name, y_pred in predictions.items():
94
y_true = test_y.to_numpy().argmax(axis=1)
95
y_pred = np.array(y_pred)
96
97
accuracy = accuracy_score(y_true, y_pred.argmax(axis=1).ravel())
98
neptune_run[f'nodes/evaluate_models/metrics/accuracy_{name}'] = accuracy
99
100
fig, ax = plt.subplots()
101
plot_roc_curve(test_y.idxmax(axis=1), y_pred, ax=ax, title=f'ROC curve {name}')
102
neptune_run['nodes/evaluate_models/plots/plot_roc_curve'].log(fig)
103
104
fig, ax = plt.subplots()
105
plot_precision_recall_curve(test_y.idxmax(axis=1), y_pred, ax=ax, title=f'PR curve {name}')
106
neptune_run['nodes/evaluate_models/plots/plot_precision_recall_curve'].log(fig)
107
108
109
def ensemble_models(predictions: dict, test_y: pd.DataFrame,
110
neptune_run: neptune.run.Handler) -> np.ndarray:
111
"""Node for averaging predictions of Random Forest and MLP models"""
112
y_true = test_y.to_numpy().argmax(axis=1)
113
y_pred_averaged = np.stack(predictions.values()).mean(axis=0)
114
115
accuracy = accuracy_score(y_true, y_pred_averaged.argmax(axis=1).ravel())
116
neptune_run[f'nodes/ensemble_models/metrics/accuracy_ensemble'] = accuracy
117
Copied!

Step 2: Add evaluation node and log accuracy score to Neptune

    Import Neptune client toward the top of the nodes.py
snippet
nodes.py
1
import neptune.new as neptune
Copied!
1
# Copyright 2021 QuantumBlack Visual Analytics Limited
2
#
3
# Licensed under the Apache License, Version 2.0 (the "License");
4
# you may not use this file except in compliance with the License.
5
# You may obtain a copy of the License at
6
#
7
# http://www.apache.org/licenses/LICENSE-2.0
8
#
9
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
10
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
11
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND
12
# NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS
13
# BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN
14
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN
15
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
16
#
17
# The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo
18
# (either separately or in combination, "QuantumBlack Trademarks") are
19
# trademarks of QuantumBlack. The License does not grant you any right or
20
# license to the QuantumBlack Trademarks. You may not use the QuantumBlack
21
# Trademarks or any confusingly similar mark as a trademark for your product,
22
# or use the QuantumBlack Trademarks in any other manner that might cause
23
# confusion in the marketplace, including but not limited to in advertising,
24
# on websites, or on software.
25
#
26
# See the License for the specific language governing permissions and
27
# limitations under the License.
28
29
"""Example code for the nodes in the example pipeline. This code is meant
30
just for illustrating basic Kedro features.
31
32
Delete this when you start working on your own Kedro project.
33
"""
34
# pylint: disable=invalid-name
35
36
import logging
37
import matplotlib.pyplot as plt
38
import neptune.new as neptune
39
import numpy as np
40
import pandas as pd
41
from scikitplot.metrics import plot_roc_curve, plot_precision_recall_curve
42
from sklearn.ensemble import RandomForestClassifier
43
from sklearn.metrics import accuracy_score
44
from sklearn.neural_network import MLPClassifier
45
from typing import Any, Dict
46
47
48
def train_rf_model(
49
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
50
):
51
"""Node for training Random Forest model"""
52
max_depth = parameters["rf_max_depth"]
53
n_estimators = parameters["rf_n_estimators"]
54
max_features = parameters["rf_max_features"]
55
56
clf = RandomForestClassifier(max_depth=max_depth,
57
n_estimators=n_estimators,
58
max_features=max_features)
59
clf.fit(train_x, train_y.idxmax(axis=1))
60
61
return clf
62
63
64
def train_mlp_model(
65
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
66
):
67
"""Node for training MLP model"""
68
alpha = parameters["mlp_alpha"]
69
max_iter = parameters["mlp_max_iter"]
70
71
clf = MLPClassifier(alpha=alpha,
72
max_iter=max_iter)
73
clf.fit(train_x, train_y)
74
75
return clf
76
77
78
def get_predictions(rf_model: RandomForestClassifier, mlp_model: MLPClassifier,
79
test_x: pd.DataFrame) -> Dict[str, Any]:
80
"""Node for making predictions given a pre-trained model and a test set."""
81
predictions = {}
82
for name, model in zip(['rf', 'mlp'], [rf_model, mlp_model]):
83
y_pred = model.predict_proba(test_x).tolist()
84
predictions[name] = y_pred
85
86
return predictions
87
88
89
def evaluate_models(predictions: dict, test_y: pd.DataFrame,
90
neptune_run: neptune.run.Handler):
91
"""Node for evaluating Random Forest and MLP models and creating ROC and Precision-Recall Curves"""
92
93
for name, y_pred in predictions.items():
94
y_true = test_y.to_numpy().argmax(axis=1)
95
y_pred = np.array(y_pred)
96
97
accuracy = accuracy_score(y_true, y_pred.argmax(axis=1).ravel())
98
neptune_run[f'nodes/evaluate_models/metrics/accuracy_{name}'] = accuracy
99
100
fig, ax = plt.subplots()
101
plot_roc_curve(test_y.idxmax(axis=1), y_pred, ax=ax, title=f'ROC curve {name}')
102
neptune_run['nodes/evaluate_models/plots/plot_roc_curve'].log(fig)
103
104
fig, ax = plt.subplots()
105
plot_precision_recall_curve(test_y.idxmax(axis=1), y_pred, ax=ax, title=f'PR curve {name}')
106
neptune_run['nodes/evaluate_models/plots/plot_precision_recall_curve'].log(fig)
107
108
109
def ensemble_models(predictions: dict, test_y: pd.DataFrame,
110
neptune_run: neptune.run.Handler) -> np.ndarray:
111
"""Node for averaging predictions of Random Forest and MLP models"""
112
y_true = test_y.to_numpy().argmax(axis=1)
113
y_pred_averaged = np.stack(predictions.values()).mean(axis=0)
114
115
accuracy = accuracy_score(y_true, y_pred_averaged.argmax(axis=1).ravel())
116
neptune_run[f'nodes/ensemble_models/metrics/accuracy_ensemble'] = accuracy
117
Copied!
    Add neptune_run argument of type neptune.run.Handler to the report_accuracy function
snippet
nodes.py
1
def evaluate_models(predictions: dict, test_y: pd.DataFrame,
2
neptune_run: neptune.run.Handler):
3
...
Copied!
1
# Copyright 2021 QuantumBlack Visual Analytics Limited
2
#
3
# Licensed under the Apache License, Version 2.0 (the "License");
4
# you may not use this file except in compliance with the License.
5
# You may obtain a copy of the License at
6
#
7
# http://www.apache.org/licenses/LICENSE-2.0
8
#
9
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
10
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
11
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND
12
# NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS
13
# BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN
14
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN
15
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
16
#
17
# The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo
18
# (either separately or in combination, "QuantumBlack Trademarks") are
19
# trademarks of QuantumBlack. The License does not grant you any right or
20
# license to the QuantumBlack Trademarks. You may not use the QuantumBlack
21
# Trademarks or any confusingly similar mark as a trademark for your product,
22
# or use the QuantumBlack Trademarks in any other manner that might cause
23
# confusion in the marketplace, including but not limited to in advertising,
24
# on websites, or on software.
25
#
26
# See the License for the specific language governing permissions and
27
# limitations under the License.
28
29
"""Example code for the nodes in the example pipeline. This code is meant
30
just for illustrating basic Kedro features.
31
32
Delete this when you start working on your own Kedro project.
33
"""
34
# pylint: disable=invalid-name
35
36
import logging
37
import matplotlib.pyplot as plt
38
import neptune.new as neptune
39
import numpy as np
40
import pandas as pd
41
from scikitplot.metrics import plot_roc_curve, plot_precision_recall_curve
42
from sklearn.ensemble import RandomForestClassifier
43
from sklearn.metrics import accuracy_score
44
from sklearn.neural_network import MLPClassifier
45
from typing import Any, Dict
46
47
48
def train_rf_model(
49
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
50
):
51
"""Node for training Random Forest model"""
52
max_depth = parameters["rf_max_depth"]
53
n_estimators = parameters["rf_n_estimators"]
54
max_features = parameters["rf_max_features"]
55
56
clf = RandomForestClassifier(max_depth=max_depth,
57
n_estimators=n_estimators,
58
max_features=max_features)
59
clf.fit(train_x, train_y.idxmax(axis=1))
60
61
return clf
62
63
64
def train_mlp_model(
65
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
66
):
67
"""Node for training MLP model"""
68
alpha = parameters["mlp_alpha"]
69
max_iter = parameters["mlp_max_iter"]
70
71
clf = MLPClassifier(alpha=alpha,
72
max_iter=max_iter)
73
clf.fit(train_x, train_y)
74
75
return clf
76
77
78
def get_predictions(rf_model: RandomForestClassifier, mlp_model: MLPClassifier,
79
test_x: pd.DataFrame) -> Dict[str, Any]:
80
"""Node for making predictions given a pre-trained model and a test set."""
81
predictions = {}
82
for name, model in zip(['rf', 'mlp'], [rf_model, mlp_model]):
83
y_pred = model.predict_proba(test_x).tolist()
84
predictions[name] = y_pred
85
86
return predictions
87
88
89
def evaluate_models(predictions: dict, test_y: pd.DataFrame,
90
neptune_run: neptune.run.Handler):
91
"""Node for evaluating Random Forest and MLP models and creating ROC and Precision-Recall Curves"""
92
93
for name, y_pred in predictions.items():
94
y_true = test_y.to_numpy().argmax(axis=1)
95
y_pred = np.array(y_pred)
96
97
accuracy = accuracy_score(y_true, y_pred.argmax(axis=1).ravel())
98
neptune_run[f'nodes/evaluate_models/metrics/accuracy_{name}'] = accuracy
99
100
fig, ax = plt.subplots()
101
plot_roc_curve(test_y.idxmax(axis=1), y_pred, ax=ax, title=f'ROC curve {name}')
102
neptune_run['nodes/evaluate_models/plots/plot_roc_curve'].log(fig)
103
104
fig, ax = plt.subplots()
105
plot_precision_recall_curve(test_y.idxmax(axis=1), y_pred, ax=ax, title=f'PR curve {name}')
106
neptune_run['nodes/evaluate_models/plots/plot_precision_recall_curve'].log(fig)
107
108
109
def ensemble_models(predictions: dict, test_y: pd.DataFrame,
110
neptune_run: neptune.run.Handler) -> np.ndarray:
111
"""Node for averaging predictions of Random Forest and MLP models"""
112
y_true = test_y.to_numpy().argmax(axis=1)
113
y_pred_averaged = np.stack(predictions.values()).mean(axis=0)
114
115
accuracy = accuracy_score(y_true, y_pred_averaged.argmax(axis=1).ravel())
116
neptune_run[f'nodes/ensemble_models/metrics/accuracy_ensemble'] = accuracy
117
Copied!
You can treat neptune_run like a normal Neptune Run and log any ML metadata to it.
You have to use a special string "neptune_run" to use the Neptune Run handler in Kedro pipelines.
    Calculate and log accuracy it to 'nodes/evaluate_models/metrics/accuracy_{model_name}' namespace.
snippet
nodes.py
1
def evaluate_models(predictions: dict, test_y: pd.DataFrame,
2
neptune_run: neptune.run.Handler):
3
...
4
5
for name, y_pred in predictions.items():
6
y_true = test_y.to_numpy().argmax(axis=1)
7
y_pred = np.array(y_pred)
8
9
accuracy = accuracy_score(y_true, y_pred.argmax(axis=1).ravel())
10
neptune_run[f'nodes/evaluate_models/metrics/accuracy_{name}'] = accuracy
Copied!
1
# Copyright 2021 QuantumBlack Visual Analytics Limited
2
#
3
# Licensed under the Apache License, Version 2.0 (the "License");
4
# you may not use this file except in compliance with the License.
5
# You may obtain a copy of the License at
6
#
7
# http://www.apache.org/licenses/LICENSE-2.0
8
#
9
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
10
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
11
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND
12
# NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS
13
# BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN
14
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN
15
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
16
#
17
# The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo
18
# (either separately or in combination, "QuantumBlack Trademarks") are
19
# trademarks of QuantumBlack. The License does not grant you any right or
20
# license to the QuantumBlack Trademarks. You may not use the QuantumBlack
21
# Trademarks or any confusingly similar mark as a trademark for your product,
22
# or use the QuantumBlack Trademarks in any other manner that might cause
23
# confusion in the marketplace, including but not limited to in advertising,
24
# on websites, or on software.
25
#
26
# See the License for the specific language governing permissions and
27
# limitations under the License.
28
29
"""Example code for the nodes in the example pipeline. This code is meant
30
just for illustrating basic Kedro features.
31
32
Delete this when you start working on your own Kedro project.
33
"""
34
# pylint: disable=invalid-name
35
36
import logging
37
import matplotlib.pyplot as plt
38
import neptune.new as neptune
39
import numpy as np
40
import pandas as pd
41
from scikitplot.metrics import plot_roc_curve, plot_precision_recall_curve
42
from sklearn.ensemble import RandomForestClassifier
43
from sklearn.metrics import accuracy_score
44
from sklearn.neural_network import MLPClassifier
45
from typing import Any, Dict
46
47
48
def train_rf_model(
49
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
50
):
51
"""Node for training Random Forest model"""
52
max_depth = parameters["rf_max_depth"]
53
n_estimators = parameters["rf_n_estimators"]
54
max_features = parameters["rf_max_features"]
55
56
clf = RandomForestClassifier(max_depth=max_depth,
57
n_estimators=n_estimators,
58
max_features=max_features)
59
clf.fit(train_x, train_y.idxmax(axis=1))
60
61
return clf
62
63
64
def train_mlp_model(
65
train_x: pd.DataFrame, train_y: pd.DataFrame, parameters: Dict[str, Any]
66
):
67
"""Node for training MLP model"""
68
alpha = parameters["mlp_alpha"]
69
max_iter = parameters["mlp_max_iter"]
70
71
clf = MLPClassifier(alpha=alpha,
72
max_iter=max_iter)
73
clf.fit(train_x, train_y)
74
75
return clf
76
77
78
def get_predictions(rf_model: RandomForestClassifier, mlp_model: MLPClassifier,
79
test_x: pd.DataFrame) -> Dict[str, Any]:
80
"""Node for making predictions given a pre-trained model and a test set."""
81
predictions = {}
82
for name, model in zip(['rf', 'mlp'], [rf_model, mlp_model]):
83
y_pred = model.predict_proba(test_x).tolist()
84
predictions[name] = y_pred
85
86
return predictions
87
88
89
def evaluate_models(predictions: dict, test_y: pd.DataFrame,
90
neptune_run: neptune.run.Handler):
91
"""Node for evaluating Random Forest and MLP models and creating ROC and Precision-Recall Curves"""
92
93
for name, y_pred in predictions.items():
94
y_true = test_y.to_numpy().argmax(axis=1)
95
y_pred = np.array(y_pred)
96
97
accuracy = accuracy_score(y_true, y_pred.argmax(axis=1).ravel())
98
neptune_run[f'nodes/evaluate_models/metrics/accuracy_{name}'] = accuracy
99
100
fig, ax = plt.subplots()
101
plot_roc_curve(test_y.idxmax(axis=1), y_pred, ax=ax, title=f'ROC curve {name}')
102
neptune_run['nodes/evaluate_models/plots/plot_roc_curve'].log(fig)
103
104
fig, ax = plt.subplots()
105
plot_precision_recall_curve(test_y.idxmax(axis=1), y_pred, ax=ax, title=f'PR curve {name}')
106
neptune_run['nodes/evaluate_models/plots/plot_precision_recall_curve'].log(fig)
107
108
109
def ensemble_models(predictions: dict, test_y: pd.DataFrame,
110
neptune_run: neptune.run.Handler) -> np.ndarray:
111
"""Node for averaging predictions of Random Forest and MLP models"""
112
y_true = test_y.to_numpy().argmax(axis=1)
113
y_pred_averaged = np.stack(predictions.values()).mean(axis=0)
114
115
accuracy = accuracy_score(y_true, y_pred_averaged.argmax(axis=1).ravel())
116
neptune_run[f'nodes/ensemble_models/metrics/accuracy_ensemble'] = accuracy
117
Copied!

Step 3: Add Neptune Run handler to the Kedro pipeline

    Go to a pipeline definition, src/KEDRO_PROJECT/pipelines/data_science/pipelines.py
    Add neptune_run Run handler as an input to the evaluate_models node
snippet
pipelines.py
1
node(
2
evaluate_models,
3
dict(predictions="predictions",
4
test_y="example_test_y",
5
neptune_run="neptune_run"),
6
None,
7
name="evaluate_models",
8
),
Copied!
pipelines/data_science/pipelines.py
1
# Copyright 2021 QuantumBlack Visual Analytics Limited
2
#
3
# Licensed under the Apache License, Version 2.0 (the "License");
4
# you may not use this file except in compliance with the License.
5
# You may obtain a copy of the License at
6
#
7
# http://www.apache.org/licenses/LICENSE-2.0
8
#
9
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
10
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
11
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND
12
# NONINFRINGEMENT. IN NO EVENT WILL THE LICENSOR OR OTHER CONTRIBUTORS
13
# BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN
14
# ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN
15
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
16
#
17
# The QuantumBlack Visual Analytics Limited ("QuantumBlack") name and logo
18
# (either separately or in combination, "QuantumBlack Trademarks") are
19
# trademarks of QuantumBlack. The License does not grant you any right or
20
# license to the QuantumBlack Trademarks. You may not use the QuantumBlack
21
# Trademarks or any confusingly similar mark as a trademark for your product,
22
# or use the QuantumBlack Trademarks in any other manner that might cause
23
# confusion in the marketplace, including but not limited to in advertising,
24
# on websites, or on software.
25
#
26
# See the License for the specific language governing permissions and
27
# limitations under the License.
28
29
"""Example code for the nodes in the example pipeline. This code is meant
30
just for illustrating basic Kedro features.
31
32
Delete this when you start working on your own Kedro project.
33
"""
34
35
from kedro.pipeline import Pipeline, node
36
37
from .nodes import predict, report_accuracy, train_model
38
39
40
def create_pipeline(**kwargs):
41
return Pipeline(
42
[
43
node(
44
train_model,
45
["example_train_x", "example_train_y", "parameters"],
46
"example_model",
47
name="train",
48
),
49
node(
50
predict,
51
dict(model="example_model", test_x="example_test_x"),
52
"example_predictions",
53
name="predict",
54
),
55
node(
56
report_accuracy,
57
["example_predictions", "example_test_y","neptune_run"],
58
None,
59
name="report",
60
),
61
]
62
)
63
Copied!

Step 4: Run training with different parameters

    Go to 'conf/base/parameters.yml' and change model training hyperparameters
snippet
parameters.yml
1
# Random forest parameters
2
rf_max_depth: 3
3
rf_max_features: 3
4
rf_n_estimators: 25
5
6
# MLP parameters
7
mlp_alpha: 0.02
8
mlp_max_iter: 50
Copied!
1
# remove the example pipeline from hooks.py and the example nodes in
2
# `src/pipelines/`
3
4
# Data split parameters
5
example_test_data_ratio: 0.2
6
7
# Random forest parameters
8
rf_max_depth: 3
9
rf_max_features: 3
10
rf_n_estimators: 25
11
12
# MLP parameters
13
mlp_alpha: 0.02
14
mlp_max_iter: 50
Copied!
    Go to your console and execute your Kedro pipeline
1
kedro run
Copied!
Run it a few times with different parameters to have a few Runs to compare in the Neptune UI.

Step 5: Compare nodes in a single Kedro pipeline execution

    Click on the Neptune Run link in your console or use an example link
    Go to the 'kedro/nodes/evaluate_models/metrics' namespace to compare your models on accuracy:
Kedro nodes compared on model accuracy in the Neptune UI.
    Go to the 'kedro/nodes/evaluate_models/metrics' namespace to compare your models on accuracy:
Kedro nodes compared on model ROC curve in the Neptune UI.
Kedro nodes compared on model accuracy and ROC curves in the Neptune UI.

Step 6: Compare nodes between many Kedro pipeline executions

    Go to the Runs table
    Click on +Add column button to add parameters from 'kedro/catalog/parameters/*' , and 'kedro/nodes/evaluate_models/metrics'/* to the Runs table.
Adding columns with parameters and metrics from model training in Kedro nodes.
Customized Runs table for comparing Kedro node results between pipeline executions.
You can save the Runs table view for later.

Summary

In this guide you learned: