Skip to content

How to use Neptune with DALEX#

Open in Colab

DALEX reports visualized in Neptune

DALEX is an open source tool for exploring and explaining model behavior to understand how complex models are working. With Neptune, you can upload the following DALEX metadata:

  • The pickled dalex explainer object
  • Interactive reports

This guide is adapted from the DALEX documentation .

See in Neptune  Code examples 

Legacy integration

You can also use the integration for the Neptune legacy API.

Legacy docsNeptune-DALEX integration

Before you start#

  • Sign up at neptune.ai/register.
  • Create a project for storing your metadata.
  • Have dalex and Neptune installed.

    To follow the example, also install pandas and scikit-learn.

    pip install -U dalex neptune pandas scikit-learn
    
    conda install -c conda-forge dalex neptune pandas scikit-learn
    
Passing your Neptune credentials

Once you've registered and created a project, set your Neptune API token and full project name to the NEPTUNE_API_TOKEN and NEPTUNE_PROJECT environment variables, respectively.

export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...6Lc"

To find your API token: In the bottom-left corner of the Neptune app, expand the user menu and select Get my API token.

export NEPTUNE_PROJECT="ml-team/classification"

Your full project name has the form workspace-name/project-name. You can copy it from the project settings: Click the menu in the top-right → Edit project details.

On Windows, navigate to SettingsEdit the system environment variables, or enter the following in Command Prompt: setx SOME_NEPTUNE_VARIABLE 'some-value'


While it's not recommended especially for the API token, you can also pass your credentials in the code when initializing Neptune.

run = neptune.init_run(
    project="ml-team/classification",  # your full project name here
    api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh...3Kb8",  # your API token here
)

For more help, see Set Neptune credentials.

Creating and logging an explainer object#

  1. Prepare the data.

    import dalex as dx
    
    import pandas as pd
    
    from sklearn.neural_network import MLPClassifier
    from sklearn.preprocessing import StandardScaler, OneHotEncoder
    from sklearn.impute import SimpleImputer
    from sklearn.pipeline import Pipeline
    from sklearn.compose import ColumnTransformer
    
    import warnings
    
    warnings.filterwarnings("ignore")
    
    data = dx.datasets.load_titanic()
    
    X = data.drop(columns="survived")
    y = data.survived
    
    data.head(10)
    
  2. Create and fit a pipeline model.

    numerical_features = ["age", "fare", "sibsp", "parch"]
    numerical_transformer = Pipeline(
        steps=[
            ("imputer", SimpleImputer(strategy="median")),
            ("scaler", StandardScaler()),
        ]
    )
    
    categorical_features = ["gender", "class", "embarked"]
    categorical_transformer = Pipeline(
        steps=[
            ("imputer", SimpleImputer(strategy="constant", fill_value="missing")),
            ("onehot", OneHotEncoder(handle_unknown="ignore")),
        ]
    )
    
    preprocessor = ColumnTransformer(
        transformers=[
            ("num", numerical_transformer, numerical_features),
            ("cat", categorical_transformer, categorical_features),
        ]
    )
    
    classifier = MLPClassifier(
        hidden_layer_sizes=(150, 100, 50), max_iter=500, random_state=0
    )
    
    clf = Pipeline(
        steps=[("preprocessor", preprocessor), ("classifier", classifier)]
    )
    
    clf.fit(X, y)
    
  3. Import Neptune and start a run:

    import neptune
    
    run = neptune.init_run() # (1)!
    
    1. If you haven't set up your credentials, you can log anonymously:

      neptune.init_run(
          api_token=neptune.ANONYMOUS_API_TOKEN,
          project="common/dalex-support",
      )
      
    If Neptune can't find your project name or API token

    As a best practice, you should save your Neptune API token and project name as environment variables:

    export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8"
    
    export NEPTUNE_PROJECT="ml-team/classification"
    

    Alternatively, you can pass the information when using a function that takes api_token and project as arguments:

    run = neptune.init_run( # (1)!
        api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8",  # your token here
        project="ml-team/classification",  # your full project name here
    )
    
    1. Also works for init_model(), init_model_version(), init_project(), and integrations that create Neptune runs underneath the hood, such as NeptuneLogger or NeptuneCallback.

    2. API token: In the bottom-left corner, expand the user menu and select Get my API token.

    3. Project name: You can copy the path from the project details ( Edit project details).

    If you haven't registered, you can log anonymously to a public project:

    api_token=neptune.ANONYMOUS_API_TOKEN
    project="common/quickstarts"
    

    Make sure not to publish sensitive data through your code!

  4. Create an explainer for the model:

    exp = dx.Explainer(clf, X, y)
    
  5. Upload the explainer to Neptune.

    You can use DALEX's dumps() method to get a pickled representation of the explainer, then upload it to Neptune using Neptune's from_content() method.

    from neptune.types import File
    
    run["pickled_explainer"].upload(File.from_content(exp.dumps()))
    
  6. To stop the connection to Neptune and sync all data, call the stop() method:

    run.stop()
    

Logging model level explanations#

model_performance()#

This function calculates various Model Performance measures:

  • Classification: F1, accuracy, recall, precision, and AUC.
  • Regression: mean squared error, R squared, median absolute deviation.
mp = exp.model_performance()
mp.plot(geom="roc")

You can upload these ROC plots to Neptune by setting the show argument to False.

To distinguish between the plot types, you can use different namespaces. For example, "model/performace/roc", "model/performance/ecdf", etc.

run["model/performance/roc"].upload(mp.plot(geom="roc", show=False))

Related

Learn more about Neptune Namespaces and fields.

model_parts()#

This function calculates Variable Importance.

vi = exp.model_parts()
vi.plot()

You can also calculate variable importance of a group of variables:

vi_grouped = exp.model_parts(
    variable_groups={
        "personal": ["gender", "age", "sibsp", "parch"], "wealth": ["class", "fare"]
    }
)
vi_grouped.plot()

Upload variable importance plots to Neptune:

run["model/variable_importance/single"].upload(vi.plot(show=False))
run["model/variable_importance/grouped"].upload(vi_grouped.plot(show=False))

model_profile()#

This function calculates explanations that explore model response as a function of selected variables.

The explanations can be calculated as Partial Dependence Profile or Accumulated Local Dependence Profile.

pdp_num = exp.model_profile(type="partial", label="pdp")
ale_num = exp.model_profile(type="accumulated", label="ale")
pdp_num.plot(ale_num)

pdp_cat = exp.model_profile(
    type="partial",
    variable_type="categorical",
    variables=["gender", "class"],
    label="pdp",
)
ale_cat = exp.model_profile(
    type="accumulated",
    variable_type="categorical",variables=["gender", "class"],
    label="ale",
)
ale_cat.plot(pdp_cat)

Upload model profile plots to Neptune:

run["model/profile/num"].upload(pdp_num.plot(ale_num, show=False))
run["model/profile/cat"].upload(ale_cat.plot(pdp_cat, show=False))

Logging prediction level explanations#

Let's create two example persons for this tutorial.

john = pd.DataFrame(
    {
        "gender": ["male"],
        "age": [25],
        "class": ["1st"],
        "embarked": ["Southampton"],
        "fare": [72],
        "sibsp": [0],
        "parch": 0,
    },
    index=["John"],
)

mary = pd.DataFrame(
    {
        "gender": ["female"],
        "age": [35],
        "class": ["3rd"],
        "embarked": ["Cherbourg"],
        "fare": [25],
        "sibsp": [0],
        "parch": [0],
    },
    index=["Mary"],
)

predict_parts()#

This function calculates Variable Attributions as Break Down, iBreakDown or Shapley Values explanations.

Model prediction is decomposed into parts that are attributed for particular variables.

Breakdown values for John's predictions:

bd_john = exp.predict_parts(john, type="break_down", label=john.index[0])
bd_interactions_john = exp.predict_parts(
    john, type="break_down_interactions", label="John+"
)
bd_john.plot(bd_interactions_john)

Shapely values for Mary's predictions:

sh_mary = exp.predict_parts(mary, type="shap", B=10, label=mary.index[0])
sh_mary.plot()

Upload the plots to Neptune:

run["prediction/breakdown/john"].upload(
    bd_john.plot(bd_interactions_john, show=False)
)
run["prediction/shapely/mary"].upload(sh_mary.plot(show=False))

predict_profile()#

This function computes individual profiles; that is, Ceteris Paribus Profiles.

cp_mary = exp.predict_profile(mary, label=mary.index[0])
cp_john = exp.predict_profile(john, label=john.index[0])
cp_mary.plot(cp_john)
cp_john.plot(cp_mary, variable_type="categorical")

Upload the CP plots to Neptune:

run["prediction/profile/numerical"].upload(cp_mary.plot(cp_john, show=False))
run["prediction/profile/categorical"].upload(
    cp_mary.plot(cp_john, variable_type="categorical", show=False)
)

Analyzing the results#

Once you're done logging, stop the Neptune run to close the connection and sync the data:

run.stop()

Top open the run in Neptune, click the link that appears in the console output.

Sample output

[neptune] [info ] Neptune initialized. Open in the app: https://app.neptune.ai/workspace/project/e/RUN-1

In the above example, the run ID is RUN-1.

In All metadata, navigate through the model and prediction namespaces to view the logged charts.

See example run in Neptune