How to use Neptune with DALEX#
DALEX is an open source tool for exploring and explaining model behavior to understand how complex models are working. With Neptune, you can upload the following DALEX metadata:
- The pickled dalex explainer object
- Interactive reports
This guide is adapted from the DALEX documentation .
See in Neptune  Code examples 
Legacy integration
You can also use the integration for the Neptune legacy API.
Legacy docs ≫ Neptune-DALEX integration
Before you start#
- Sign up at neptune.ai/register.
- Create a project for storing your metadata.
-
Have dalex and Neptune installed.
To follow the example, also install pandas and scikit-learn.
Passing your Neptune credentials
Once you've registered and created a project, set your Neptune API token and full project name to the NEPTUNE_API_TOKEN
and NEPTUNE_PROJECT
environment variables, respectively.
To find your API token: In the bottom-left corner of the Neptune app, expand the user menu and select Get my API token.
Your full project name has the form workspace-name/project-name
. You can copy it from the project settings: Click the
menu in the top-right →
Details & privacy.
On Windows, navigate to Settings → Edit the system environment variables, or enter the following in Command Prompt: setx SOME_NEPTUNE_VARIABLE 'some-value'
While it's not recommended especially for the API token, you can also pass your credentials in the code when initializing Neptune.
run = neptune.init_run(
project="ml-team/classification", # your full project name here
api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh...3Kb8", # your API token here
)
For more help, see Set Neptune credentials.
Creating and logging an explainer object#
-
Prepare the data.
import dalex as dx import pandas as pd from sklearn.neural_network import MLPClassifier from sklearn.preprocessing import StandardScaler, OneHotEncoder from sklearn.impute import SimpleImputer from sklearn.pipeline import Pipeline from sklearn.compose import ColumnTransformer import warnings warnings.filterwarnings("ignore") data = dx.datasets.load_titanic() X = data.drop(columns="survived") y = data.survived data.head(10)
-
Create and fit a pipeline model.
numerical_features = ["age", "fare", "sibsp", "parch"] numerical_transformer = Pipeline( steps=[ ("imputer", SimpleImputer(strategy="median")), ("scaler", StandardScaler()), ] ) categorical_features = ["gender", "class", "embarked"] categorical_transformer = Pipeline( steps=[ ("imputer", SimpleImputer(strategy="constant", fill_value="missing")), ("onehot", OneHotEncoder(handle_unknown="ignore")), ] ) preprocessor = ColumnTransformer( transformers=[ ("num", numerical_transformer, numerical_features), ("cat", categorical_transformer, categorical_features), ] ) classifier = MLPClassifier( hidden_layer_sizes=(150, 100, 50), max_iter=500, random_state=0 ) clf = Pipeline( steps=[("preprocessor", preprocessor), ("classifier", classifier)] ) clf.fit(X, y)
-
Import Neptune and start a run:
-
If you haven't set up your credentials, you can log anonymously:
If Neptune can't find your project name or API token
As a best practice, you should save your Neptune API token and project name as environment variables:
Alternatively, you can pass the information when using a function that takes
api_token
andproject
as arguments:run = neptune.init_run( api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8", # (1)! project="ml-team/classification", # (2)! )
- In the bottom-left corner, expand the user menu and select Get my API token.
- You can copy the path from the project details ( → Details & privacy).
If you haven't registered, you can log anonymously to a public project:
Make sure not to publish sensitive data through your code!
-
-
Create an explainer for the model:
-
Upload the explainer to Neptune.
You can use DALEX's
dumps()
method to get a pickled representation of the explainer, then upload it to Neptune using Neptune'sfrom_content()
method. -
To stop the connection to Neptune and sync all data, call the
stop()
method:
Logging model level explanations#
model_performance()
#
This function calculates various Model Performance measures:
- Classification: F1, accuracy, recall, precision, and AUC.
- Regression: mean squared error, R squared, median absolute deviation.
You can upload these ROC plots to Neptune by setting the show
argument to False
.
To distinguish between the plot types, you can use different namespaces. For example, "model/performace/roc"
, "model/performance/ecdf"
, etc.
Related
Learn more about Neptune Namespaces and fields.
model_parts()
#
This function calculates Variable Importance.
You can also calculate variable importance of a group of variables:
vi_grouped = exp.model_parts(
variable_groups={
"personal": ["gender", "age", "sibsp", "parch"], "wealth": ["class", "fare"]
}
)
vi_grouped.plot()
Upload variable importance plots to Neptune:
run["model/variable_importance/single"].upload(vi.plot(show=False))
run["model/variable_importance/grouped"].upload(vi_grouped.plot(show=False))
model_profile()
#
This function calculates explanations that explore model response as a function of selected variables.
The explanations can be calculated as Partial Dependence Profile or Accumulated Local Dependence Profile.
pdp_num = exp.model_profile(type="partial", label="pdp")
ale_num = exp.model_profile(type="accumulated", label="ale")
pdp_num.plot(ale_num)
pdp_cat = exp.model_profile(
type="partial",
variable_type="categorical",
variables=["gender", "class"],
label="pdp",
)
ale_cat = exp.model_profile(
type="accumulated",
variable_type="categorical",variables=["gender", "class"],
label="ale",
)
ale_cat.plot(pdp_cat)
Upload model profile plots to Neptune:
run["model/profile/num"].upload(pdp_num.plot(ale_num, show=False))
run["model/profile/cat"].upload(ale_cat.plot(pdp_cat, show=False))
Logging prediction level explanations#
Let's create two example persons for this tutorial.
john = pd.DataFrame(
{
"gender": ["male"],
"age": [25],
"class": ["1st"],
"embarked": ["Southampton"],
"fare": [72],
"sibsp": [0],
"parch": 0,
},
index=["John"],
)
mary = pd.DataFrame(
{
"gender": ["female"],
"age": [35],
"class": ["3rd"],
"embarked": ["Cherbourg"],
"fare": [25],
"sibsp": [0],
"parch": [0],
},
index=["Mary"],
)
predict_parts()
#
This function calculates Variable Attributions as Break Down, iBreakDown or Shapley Values explanations.
Model prediction is decomposed into parts that are attributed for particular variables.
Breakdown values for John's predictions:
bd_john = exp.predict_parts(john, type="break_down", label=john.index[0])
bd_interactions_john = exp.predict_parts(
john, type="break_down_interactions", label="John+"
)
bd_john.plot(bd_interactions_john)
Shapely values for Mary's predictions:
Upload the plots to Neptune:
run["prediction/breakdown/john"].upload(
bd_john.plot(bd_interactions_john, show=False)
)
run["prediction/shapely/mary"].upload(sh_mary.plot(show=False))
predict_profile()
#
This function computes individual profiles; that is, Ceteris Paribus Profiles.
cp_mary = exp.predict_profile(mary, label=mary.index[0])
cp_john = exp.predict_profile(john, label=john.index[0])
cp_mary.plot(cp_john)
cp_john.plot(cp_mary, variable_type="categorical")
Upload the CP plots to Neptune:
run["prediction/profile/numerical"].upload(cp_mary.plot(cp_john, show=False))
run["prediction/profile/categorical"].upload(
cp_mary.plot(cp_john, variable_type="categorical", show=False)
)
Analyzing the results#
Once you're done logging, stop the Neptune run to close the connection and sync the data:
Top open the run in Neptune, click the link that appears in the console output.
Sample output
[neptune] [info ] Neptune initialized. Open in the app:
https://app.neptune.ai/workspace/project/e/RUN-1
In All metadata, navigate through the model and prediction namespaces to view the logged charts.