Skip to content

How to use Neptune with DALEX#

Open in Colab

DALEX reports visualized in Neptune

DALEX is an open-source tool for exploring and explaining model behavior to understand how complex models are working. With Neptune, you can upload the following DALEX metadata:

  • The pickled dalex explainer object
  • Interactive reports

This guide is adapted from the DALEX documentation .

See in Neptune  Example scripts 

Legacy integration

You can also use the integration for the Neptune legacy API.

Legacy docsNeptune–DALEX integration

Before you start#

  • Sign up at neptune.ai/register.
  • Create a project for storing your metadata.
  • Have dalex and Neptune installed.

    To follow the example, also install pandas and scikit-learn.

    pip install -U dalex neptune pandas scikit-learn
    
    conda install -c conda-forge dalex neptune pandas scikit-learn
    
Upgrading with neptune-client already installed

Important: To smoothly upgrade to the 1.0 version of the Neptune client library, first uninstall the neptune-client library and then install neptune.

pip uninstall neptune-client
pip install neptune
Passing your Neptune credentials

Once you've registered and created a project, set your Neptune API token and full project name to the NEPTUNE_API_TOKEN and NEPTUNE_PROJECT environment variables, respectively.

export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...6Lc"

To find your API token: In the bottom-left corner of the Neptune app, expand the user menu and select Get my API token.

export NEPTUNE_PROJECT="ml-team/classification"

To find your project: Your full project name has the form workspace-name/project-name. To copy the name, click the menu in the top-right corner and select Edit project details.


While it's not recommended especially for the API token, you can also pass your credentials in the code when initializing Neptune.

run = neptune.init_run(
    project="ml-team/classification",  # your full project name here
    api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh...3Kb8",  # your API token here
)

For more help, see Set Neptune credentials.

Creating and logging an explainer object#

  1. Prepare the data.

    import dalex as dx
    
    import pandas as pd
    
    from sklearn.neural_network import MLPClassifier
    from sklearn.preprocessing import StandardScaler, OneHotEncoder
    from sklearn.impute import SimpleImputer
    from sklearn.pipeline import Pipeline
    from sklearn.compose import ColumnTransformer
    
    import warnings
    
    warnings.filterwarnings("ignore")
    
    data = dx.datasets.load_titanic()
    
    X = data.drop(columns="survived")
    y = data.survived
    
    data.head(10)
    
  2. Create and fit a pipeline model.

    numerical_features = ["age", "fare", "sibsp", "parch"]
    numerical_transformer = Pipeline(
        steps=[
            ("imputer", SimpleImputer(strategy="median")),
            ("scaler", StandardScaler()),
        ]
    )
    
    categorical_features = ["gender", "class", "embarked"]
    categorical_transformer = Pipeline(
        steps=[
            ("imputer", SimpleImputer(strategy="constant", fill_value="missing")),
            ("onehot", OneHotEncoder(handle_unknown="ignore")),
        ]
    )
    
    preprocessor = ColumnTransformer(
        transformers=[
            ("num", numerical_transformer, numerical_features),
            ("cat", categorical_transformer, categorical_features),
        ]
    )
    
    classifier = MLPClassifier(
        hidden_layer_sizes=(150, 100, 50), max_iter=500, random_state=0
    )
    
    clf = Pipeline(
        steps=[("preprocessor", preprocessor), ("classifier", classifier)]
    )
    
    clf.fit(X, y)
    
  3. Import Neptune and start a run:

    import neptune
    
    run = neptune.init_run()  # (1)!
    
    1. If you haven't set up your credentials, you can log anonymously:

      neptune.init_run(
          api_token=neptune.ANONYMOUS_API_TOKEN,
          project="common/dalex-support",
      )
      
    If Neptune can't find your project name or API token

    As a best practice, you should save your Neptune API token and project name as environment variables:

    export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh3Kb8"
    export NEPTUNE_PROJECT="ml-team/classification"
    

    You can, however, also pass them as arguments when initializing Neptune:

    run = neptune.init_run(
        api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh3Kb8",  # your token here
        project="ml-team/classification",  # your full project name here
    )
    
    • API token: In the bottom-left corner, expand the user menu and select Get my API token.
    • Project name: in the top-right menu: Edit project details.

    If you haven't registered, you can also log anonymously to a public project (make sure not to publish sensitive data through your code!):

    run = neptune.init_run(
        api_token=neptune.ANONYMOUS_API_TOKEN,
        project="common/quickstarts",
    )
    
  4. Create an explainer for the model:

    exp = dx.Explainer(clf, X, y)
    
  5. Upload the explainer to Neptune.

    You can use DALEX's dumps() method to get a pickled representation of the explainer, then upload it to Neptune using Neptune's from_content() method.

    from neptune.types import File
    
    run["pickled_explainer"].upload(File.from_content(exp.dumps()))
    
  6. To stop the connection to Neptune and sync all data, call the stop() method:

    run.stop()
    

Logging model-level explanations#

model_performance()#

This function calculates various Model Performance measures:

  • Classification: F1, accuracy, recall, precision, and AUC.
  • Regression: mean squared error, R squared, median absolute deviation.
mp = exp.model_performance()
mp.plot(geom="roc")

You can upload these ROC plots to Neptune by setting the show argument to False.

To distinguish between the plot types, you can use different namespaces. For example, "model/performace/roc", "model/performance/ecdf", etc.

run["model/performance/roc"].upload(mp.plot(geom="roc", show=False))

Related

Learn more about Neptune Namespaces and fields.

model_parts()#

This function calculates Variable Importance.

vi = exp.model_parts()
vi.plot()

You can also calculate variable importance of a group of variables:

vi_grouped = exp.model_parts(
    variable_groups={
        "personal": ["gender", "age", "sibsp", "parch"], "wealth": ["class", "fare"]
    }
)
vi_grouped.plot()

Upload variable importance plots to Neptune:

run["model/variable_importance/single"].upload(vi.plot(show=False))
run["model/variable_importance/grouped"].upload(vi_grouped.plot(show=False))

model_profile()#

This function calculates explanations that explore model response as a function of selected variables.

The explanations can be calculated as Partial Dependence Profile or Accumulated Local Dependence Profile.

pdp_num = exp.model_profile(type="partial", label="pdp")
ale_num = exp.model_profile(type="accumulated", label="ale")
pdp_num.plot(ale_num)

pdp_cat = exp.model_profile(
    type="partial",
    variable_type="categorical",
    variables=["gender", "class"],
    label="pdp",
)
ale_cat = exp.model_profile(
    type="accumulated",
    variable_type="categorical",variables=["gender", "class"],
    label="ale",
)
ale_cat.plot(pdp_cat)

Upload model profile plots to Neptune:

run["model/profile/num"].upload(pdp_num.plot(ale_num, show=False))
run["model/profile/cat"].upload(ale_cat.plot(pdp_cat, show=False))

Logging prediction-level explanations#

Let's create two example persons for this tutorial.

john = pd.DataFrame(
    {
        "gender": ["male"],
        "age": [25],
        "class": ["1st"],
        "embarked": ["Southampton"],
        "fare": [72],
        "sibsp": [0],
        "parch": 0,
    },
    index=["John"],
)

mary = pd.DataFrame(
    {
        "gender": ["female"],
        "age": [35],
        "class": ["3rd"],
        "embarked": ["Cherbourg"],
        "fare": [25],
        "sibsp": [0],
        "parch": [0],
    },
    index=["Mary"],
)

predict_parts()#

This function calculates Variable Attributions as Break Down, iBreakDown or Shapley Values explanations.

Model prediction is decomposed into parts that are attributed for particular variables.

Breakdown values for John's predictions:

bd_john = exp.predict_parts(john, type="break_down", label=john.index[0])
bd_interactions_john = exp.predict_parts(
    john, type="break_down_interactions", label="John+"
)
bd_john.plot(bd_interactions_john)

Shapely values for Mary's predictions:

sh_mary = exp.predict_parts(mary, type="shap", B=10, label=mary.index[0])
sh_mary.plot()

Upload the plots to Neptune:

run["prediction/breakdown/john"].upload(
    bd_john.plot(bd_interactions_john, show=False)
)
run["prediction/shapely/mary"].upload(sh_mary.plot(show=False))

predict_profile()#

This function computes individual profiles; that is, Ceteris Paribus Profiles.

cp_mary = exp.predict_profile(mary, label=mary.index[0])
cp_john = exp.predict_profile(john, label=john.index[0])
cp_mary.plot(cp_john)
cp_john.plot(cp_mary, variable_type="categorical")

Upload the CP plots to Neptune:

run["prediction/profile/numerical"].upload(cp_mary.plot(cp_john, show=False))
run["prediction/profile/categorical"].upload(
    cp_mary.plot(cp_john, variable_type="categorical", show=False)
)

Analyzing the results#

Once you're done logging, stop the Neptune run to close the connection and sync the data:

run.stop()

Top open the run in Neptune, click the link that appears in the console output.

Sample output

https://app.neptune.ai/workspace-name/project-name/e/RUN-100/metadata

The general format is https://app.neptune.ai/<workspace>/<project> followed by the Neptune ID of the initialized object.

In All metadata, navigate through the model and prediction namespaces to view the logged charts.

See example run in Neptune