Evidently integration guide#
Evidently is an open source tool to evaluate, test, and monitor machine learning models. With Neptune, you can:
- Upload Evidently's interactive reports.
- Log report values as key-value pairs.
- Log and visualize production data drift.
See in Neptune  Example scripts 
Before you start#
- Sign up at neptune.ai/register.
- Create a project for storing your metadata.
-
Have Evidently and Neptune installed.
To follow the example, also install pandas and scikit-learn.
Passing your Neptune credentials
Once you've registered and created a project, set your Neptune API token and full project name to the NEPTUNE_API_TOKEN
and NEPTUNE_PROJECT
environment variables, respectively.
To find your API token: In the bottom-left corner of the Neptune app, expand the user menu and select Get my API token.
Your full project name has the form workspace-name/project-name
. You can copy it from the project settings: Click the
menu in the top-right →
Details & privacy.
On Windows, navigate to Settings → Edit the system environment variables, or enter the following in Command Prompt: setx SOME_NEPTUNE_VARIABLE 'some-value'
While it's not recommended especially for the API token, you can also pass your credentials in the code when initializing Neptune.
run = neptune.init_run(
project="ml-team/classification", # your full project name here
api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh...3Kb8", # your API token here
)
For more help, see Set Neptune credentials.
Logging Evidently reports#
You can upload reports to Neptune either as HTML or as a dictionary, depending on how you want to view and access them.
You can find the entire list of pretests in the Evidently documentation .
The example uses the following libraries:
from sklearn import datasets
from evidently.test_suite import TestSuite
from evidently.test_preset import DataStabilityTestPreset
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset
-
Run Evidently test suites and reports:
-
Import Neptune and start a run:
-
If you haven't set up your credentials, you can log anonymously:
-
-
Save the reports.
Using Neptune's HTML previewer, you can view and interact with Evidently's rich HTML reports on Neptune.
As HTMLdata_stability.save_html("data_stability.html") data_drift_report.save_html("data_drift_report.html") run["data_stability/report"].upload("data_stability.html") run["data_drift/report"].upload("data_drift_report.html")
By saving Evidently's results as a dictionary to Neptune, you can have programmatic access to them to use in your CI/CD pipelines.
-
To stop the connection to Neptune and sync all data, call the
stop()
method: -
Run your script as you normally would.
To open the run, click the Neptune link that appears in the console output.
Example link: https://app.neptune.ai/common/evidently-support/e/EV-7
Result
You can view the reports in the All metadata section.
See example in Neptune  View full code example 
Logging production data drift#
You can also use Neptune to log the results when using Evidently to evaluate production data drift.
Load a dataset:
curl https://archive.ics.uci.edu/ml/machine-learning-databases/00275/Bike-Sharing-Dataset.zip --create-dirs -o data/Bike-Sharing-Dataset.zip
unzip -o data/Bike-Sharing-Dataset.zip -d data
import pandas as pd
bike_df = pd.read_csv("data/hour.csv")
bike_df["datetime"] = pd.to_datetime(bike_df["dteday"])
bike_df["datetime"] += pd.to_timedelta(bike_df.hr, unit="h")
bike_df.set_index("datetime", inplace=True)
bike_df = bike_df[
[
"season",
"holiday",
"workingday",
"weathersit",
"temp",
"atemp",
"hum",
"windspeed",
"casual",
"registered",
"cnt",
]
]
bike_df
Note
For demonstration purposes, we treat this data as the input data for a live model.
To use with production models, the prediction logs should be available.
Define column mapping for Evidently:
from evidently import ColumnMapping
data_columns = ColumnMapping()
data_columns.numerical_features = ["weathersit", "temp", "atemp", "hum", "windspeed"]
data_columns.categorical_features = ["holiday", "workingday"]
Specify which metrics you want to calculate.
In this case, you can generate the Data Drift report and log the drift score for each feature.
def eval_drift(reference, production, column_mapping):
data_drift_report = Report(metrics=[DataDriftPreset()])
data_drift_report.run(
reference_data=reference,
current_data=production,
column_mapping=column_mapping,
)
report = data_drift_report.as_dict()
drifts = []
for feature in column_mapping.numerical_features + column_mapping.categorical_features:
drifts.append(
(feature, report["metrics"][1]["result"]["drift_by_columns"][feature]["drift_score"])
)
return drifts
Specify the period that is considered reference – Evidently will use it as the base for the comparison. Then, choose the periods to treat as experiments. This emulates the production model runs.
# Set reference dates
reference_dates = ("2011-01-01 00:00:00", "2011-06-30 23:00:00")
# Set experiment batches dates
experiment_batches = [
("2011-07-01 00:00:00", "2011-07-31 00:00:00"),
("2011-08-01 00:00:00", "2011-08-31 00:00:00"),
("2011-09-01 00:00:00", "2011-09-30 00:00:00"),
("2011-10-01 00:00:00", "2011-10-31 00:00:00"),
("2011-11-01 00:00:00", "2011-11-30 00:00:00"),
("2011-12-01 00:00:00", "2011-12-31 00:00:00"),
("2012-01-01 00:00:00", "2012-01-31 00:00:00"),
("2012-02-01 00:00:00", "2012-02-29 00:00:00"),
("2012-03-01 00:00:00", "2012-03-31 00:00:00"),
("2012-04-01 00:00:00", "2012-04-30 00:00:00"),
("2012-05-01 00:00:00", "2012-05-31 00:00:00"),
("2012-06-01 00:00:00", "2012-06-30 00:00:00"),
("2012-07-01 00:00:00", "2012-07-31 00:00:00"),
("2012-08-01 00:00:00", "2012-08-31 00:00:00"),
("2012-09-01 00:00:00", "2012-09-30 00:00:00"),
("2012-10-01 00:00:00", "2012-10-31 00:00:00"),
("2012-11-01 00:00:00", "2012-11-30 00:00:00"),
("2012-12-01 00:00:00", "2012-12-31 00:00:00"),
]
Log the drifts with Neptune:
import uuid
from datetime import datetime
custom_run_id = str(uuid.uuid4())
for date in experiment_batches:
with neptune.init_run(
custom_run_id=custom_run_id, # (1)!
tags=["prod monitoring"], # (Optional) Replace with your own
) as run:
metrics = eval_drift(
bike_df.loc[reference_dates[0] : reference_dates[1]],
bike_df.loc[date[0] : date[1]],
column_mapping=data_columns,
)
for feature in metrics:
run["drift"][feature[0]].append(
round(feature[1], 3),
timestamp=datetime.strptime(
date[0], "%Y-%m-%d %H:%M:%S").timestamp() # (2)!
)
- Passing a custom run ID ensures that the metrics are logged to the same run.
- Passing a timestamp in the
append()
method lets you visualize the date in the x-axis of the charts.
If Neptune can't find your project name or API token
As a best practice, you should save your Neptune API token and project name as environment variables:
Alternatively, you can pass the information when using a function that takes api_token
and project
as arguments:
run = neptune.init_run(
api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8", # (1)!
project="ml-team/classification", # (2)!
)
- In the bottom-left corner, expand the user menu and select Get my API token.
- You can copy the path from the project details ( → Details & privacy).
If you haven't registered, you can log anonymously to a public project:
Make sure not to publish sensitive data through your code!
Follow the run link and explore the drifts in the Charts dashboard.
You might have to change the x-axis from Step to Time (absolute).
Logging LLM evaluation reports#
You can upload LLM evaluation results to Neptune just like other reports.
Inspect the logged results in the All metadata tab, or add them to custom dashboards and reports.
Visual HTML#
Pass the generated HTML file to the upload()
method:
Data frame#
Convert the dataset to HTML with File.as_html()
and then upload it:
from neptune.types import File
dataset = File.as_html(text_evals_report.datasets().current)
run["text_evals/dataset"].upload(dataset)
Python dictionary#
Use stringify_unsupported()
to ensure that all values are logged correctly:
from neptune.utils import stringify_unsupported
run["text_evals/dataset_dict"] = stringify_unsupported(text_evals_report.as_dict())
Related
- Add tags
- Set custom run ID
- What you can log and display
- API reference ≫
append()
- Evidently on GitHub
- Evidently documentation
- Arize integration guide