Skip to content

Evidently integration guide#

Open in Colab

Evidently drift visualized in Neptune

Evidently is an open source tool to evaluate, test, and monitor machine learning models. With Neptune, you can:

  • Upload Evidently's interactive reports.
  • Log report values as key-value pairs.
  • Log and visualize production data drift.

See in Neptune  Example scripts 

Before you start#

  • Sign up at neptune.ai/register.
  • Create a project for storing your metadata.
  • Have Evidently and Neptune installed.

    To follow the example, also install pandas and scikit-learn.

    pip install -U evidently neptune pandas scikit-learn
    
Passing your Neptune credentials

Once you've registered and created a project, set your Neptune API token and full project name to the NEPTUNE_API_TOKEN and NEPTUNE_PROJECT environment variables, respectively.

export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...6Lc"

To find your API token: In the bottom-left corner of the Neptune app, expand the user menu and select Get my API token.

export NEPTUNE_PROJECT="ml-team/classification"

Your full project name has the form workspace-name/project-name. You can copy it from the project settings: Click the menu in the top-right → Details & privacy.

On Windows, navigate to SettingsEdit the system environment variables, or enter the following in Command Prompt: setx SOME_NEPTUNE_VARIABLE 'some-value'


While it's not recommended especially for the API token, you can also pass your credentials in the code when initializing Neptune.

run = neptune.init_run(
    project="ml-team/classification",  # your full project name here
    api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh...3Kb8",  # your API token here
)

For more help, see Set Neptune credentials.

Logging Evidently reports#

You can upload reports to Neptune either as HTML or as a dictionary, depending on how you want to view and access them.

You can find the entire list of pretests in the Evidently documentation .

The example uses the following libraries:

from sklearn import datasets

from evidently.test_suite import TestSuite
from evidently.test_preset import DataStabilityTestPreset

from evidently.report import Report
from evidently.metric_preset import DataDriftPreset
  1. Run Evidently test suites and reports:

    data_stability = ...
    data_stability.run(...)
    
    data_drift_report = ...
    data_drift_report.run(...)
    
  2. Import Neptune and start a run:

    import neptune
    
    run = neptune.init_run() # (1)!
    
    1. If you haven't set up your credentials, you can log anonymously:

      neptune.init_run(
          api_token=neptune.ANONYMOUS_API_TOKEN,
          project="common/evidently-support",
      )
      
  3. Save the reports.

    Using Neptune's HTML previewer, you can view and interact with Evidently's rich HTML reports on Neptune.

    As HTML
    data_stability.save_html("data_stability.html")
    data_drift_report.save_html("data_drift_report.html")
    
    run["data_stability/report"].upload("data_stability.html")
    run["data_drift/report"].upload("data_drift_report.html")
    

    By saving Evidently's results as a dictionary to Neptune, you can have programmatic access to them to use in your CI/CD pipelines.

    As dictionary
    from neptune.utils import stringify_unsupported
    
    run["data_stability"] = stringify_unsupported(data_stability.as_dict())
    run["data_drift"] = stringify_unsupported(data_drift_report.as_dict())
    
  4. To stop the connection to Neptune and sync all data, call the stop() method:

    run.stop()
    
  5. Run your script as you normally would.

    To open the run, click the Neptune link that appears in the console output.

    Example link: https://app.neptune.ai/common/evidently-support/e/EV-7

Result

You can view the reports in the All metadata section.

See example in Neptune  View full code example 

Logging production data drift#

You can also use Neptune to log the results when using Evidently to evaluate production data drift.

Load a dataset:

curl https://archive.ics.uci.edu/ml/machine-learning-databases/00275/Bike-Sharing-Dataset.zip --create-dirs -o data/Bike-Sharing-Dataset.zip
unzip -o data/Bike-Sharing-Dataset.zip -d data
import pandas as pd

bike_df = pd.read_csv("data/hour.csv")
bike_df["datetime"] = pd.to_datetime(bike_df["dteday"])
bike_df["datetime"] += pd.to_timedelta(bike_df.hr, unit="h")
bike_df.set_index("datetime", inplace=True)
bike_df = bike_df[
    [
        "season",
        "holiday",
        "workingday",
        "weathersit",
        "temp",
        "atemp",
        "hum",
        "windspeed",
        "casual",
        "registered",
        "cnt",
    ]
]
bike_df

Note

For demonstration purposes, we treat this data as the input data for a live model.

To use with production models, the prediction logs should be available.

Define column mapping for Evidently:

from evidently import ColumnMapping

data_columns = ColumnMapping()
data_columns.numerical_features = ["weathersit", "temp", "atemp", "hum", "windspeed"]
data_columns.categorical_features = ["holiday", "workingday"]

Specify which metrics you want to calculate.

In this case, you can generate the Data Drift report and log the drift score for each feature.

def eval_drift(reference, production, column_mapping):
    data_drift_report = Report(metrics=[DataDriftPreset()])
    data_drift_report.run(
        reference_data=reference,
        current_data=production,
        column_mapping=column_mapping,
    )
    report = data_drift_report.as_dict()

    drifts = []

    for feature in column_mapping.numerical_features + column_mapping.categorical_features:
        drifts.append(
            (feature, report["metrics"][1]["result"]["drift_by_columns"][feature]["drift_score"])
        )

    return drifts

Specify the period that is considered reference – Evidently will use it as the base for the comparison. Then, choose the periods to treat as experiments. This emulates the production model runs.

# Set reference dates
reference_dates = ("2011-01-01 00:00:00", "2011-06-30 23:00:00")

# Set experiment batches dates
experiment_batches = [
    ("2011-07-01 00:00:00", "2011-07-31 00:00:00"),
    ("2011-08-01 00:00:00", "2011-08-31 00:00:00"),
    ("2011-09-01 00:00:00", "2011-09-30 00:00:00"),
    ("2011-10-01 00:00:00", "2011-10-31 00:00:00"),
    ("2011-11-01 00:00:00", "2011-11-30 00:00:00"),
    ("2011-12-01 00:00:00", "2011-12-31 00:00:00"),
    ("2012-01-01 00:00:00", "2012-01-31 00:00:00"),
    ("2012-02-01 00:00:00", "2012-02-29 00:00:00"),
    ("2012-03-01 00:00:00", "2012-03-31 00:00:00"),
    ("2012-04-01 00:00:00", "2012-04-30 00:00:00"),
    ("2012-05-01 00:00:00", "2012-05-31 00:00:00"),
    ("2012-06-01 00:00:00", "2012-06-30 00:00:00"),
    ("2012-07-01 00:00:00", "2012-07-31 00:00:00"),
    ("2012-08-01 00:00:00", "2012-08-31 00:00:00"),
    ("2012-09-01 00:00:00", "2012-09-30 00:00:00"),
    ("2012-10-01 00:00:00", "2012-10-31 00:00:00"),
    ("2012-11-01 00:00:00", "2012-11-30 00:00:00"),
    ("2012-12-01 00:00:00", "2012-12-31 00:00:00"),
]

Log the drifts with Neptune:

import uuid
from datetime import datetime

custom_run_id = str(uuid.uuid4())

for date in experiment_batches:
    with neptune.init_run(
        custom_run_id=custom_run_id, # (1)!
        tags=["prod monitoring"],  # (Optional) Replace with your own
    ) as run:
        metrics = eval_drift(
            bike_df.loc[reference_dates[0] : reference_dates[1]],
            bike_df.loc[date[0] : date[1]],
            column_mapping=data_columns,
        )

        for feature in metrics:
            run["drift"][feature[0]].append(
                round(feature[1], 3),
                timestamp=datetime.strptime(
                    date[0], "%Y-%m-%d %H:%M:%S").timestamp() # (2)!
            )
  1. Passing a custom run ID ensures that the metrics are logged to the same run.
  2. Passing a timestamp in the append() method lets you visualize the date in the x-axis of the charts.
If Neptune can't find your project name or API token

As a best practice, you should save your Neptune API token and project name as environment variables:

export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8"
export NEPTUNE_PROJECT="ml-team/classification"

Alternatively, you can pass the information when using a function that takes api_token and project as arguments:

run = neptune.init_run(
    api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8", # (1)!
    project="ml-team/classification", # (2)!
)
  1. In the bottom-left corner, expand the user menu and select Get my API token.
  2. You can copy the path from the project details ( Details & privacy).

If you haven't registered, you can log anonymously to a public project:

api_token=neptune.ANONYMOUS_API_TOKEN
project="common/quickstarts"

Make sure not to publish sensitive data through your code!

Follow the run link and explore the drifts in the Charts dashboard.

You might have to change the x-axis from Step to Time (absolute).

See in Neptune 

Logging LLM evaluation reports#

You can upload LLM evaluation results to Neptune just like other reports.

Inspect the logged results in the All metadata tab, or add them to custom dashboards and reports.

Visual HTML#

Pass the generated HTML file to the upload() method:

text_evals_report.save_html("report.html")
run["text_evals/report"].upload("report.html")

Data frame#

Convert the dataset to HTML with File.as_html() and then upload it:

from neptune.types import File

dataset = File.as_html(text_evals_report.datasets().current)
run["text_evals/dataset"].upload(dataset)

Python dictionary#

Use stringify_unsupported() to ensure that all values are logged correctly:

from neptune.utils import stringify_unsupported

run["text_evals/dataset_dict"] = stringify_unsupported(text_evals_report.as_dict())