Arize integration guide#
Arize and Neptune are MLOps tools that aim to improve connected but different parts of your ML pipeline and workflow.
- Arize helps you:
- visualize your production model performance
- understand drift and data quality issues
- Neptune logs, stores, displays, and compares your model-building metadata for better experiment tracking and model registry.
Together, Arize and Neptune help you:
- Train the best model
- Validate your model prelaunch
- Compare production performances of those models
Before you start#
- Sign up at neptune.ai/register.
- Create a project for storing your metadata.
- Have Arize installed.
Installing Neptune#
Install the Neptune client library:
Passing your Neptune credentials
Once you've registered and created a project, set your Neptune API token and full project name to the NEPTUNE_API_TOKEN
and NEPTUNE_PROJECT
environment variables, respectively.
To find your API token: In the bottom-left corner of the Neptune app, expand the user menu and select Get my API token.
Your full project name has the form workspace-name/project-name
. You can copy it from the project settings: Click the
menu in the top-right →
Details & privacy.
On Windows, navigate to Settings → Edit the system environment variables, or enter the following in Command Prompt: setx SOME_NEPTUNE_VARIABLE 'some-value'
While it's not recommended especially for the API token, you can also pass your credentials in the code when initializing Neptune.
run = neptune.init_run(
project="ml-team/classification", # your full project name here
api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh...3Kb8", # your API token here
)
For more help, see Set Neptune credentials.
If you want to replicate the example in this guide, also install the Neptune-Keras integration, tensorflow, numpy, and pandas:
Arize logging example#
You can use callbacks to log and visualize loss curves for each training iteration.
In this example, we'll work with Keras to build a classifier model.
-
Create a run:
If Neptune can't find your project name or API token
As a best practice, you should save your Neptune API token and project name as environment variables:
Alternatively, you can pass the information when using a function that takes
api_token
andproject
as arguments:run = neptune.init_run( api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8", # (1)! project="ml-team/classification", # (2)! )
- In the bottom-left corner, expand the user menu and select Get my API token.
- You can copy the path from the project details ( → Details & privacy).
If you haven't registered, you can log anonymously to a public project:
Make sure not to publish sensitive data through your code!
-
Instantiate an Arize client:
-
Define some model metadata:
-
Import and load the data:
import concurrent.futures as cf import datetime import os import uuid import numpy as np import pandas as pd from sklearn import datasets, preprocessing from sklearn.model_selection import train_test_split def process_data(X, y): scaler = preprocessing.MinMaxScaler() X = np.array(X).reshape((len(X), 30)) y = np.array(y) return X, y # Load data and split data data = datasets.load_breast_cancer() X, y = datasets.load_breast_cancer(return_X_y=True) X, y = X.astype(np.float32), y X, y = pd.DataFrame(X, columns=data["feature_names"]), pd.Series(y) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) X_train, X_val, y_train, y_val = train_test_split( X_train, y_train, random_state=42 )
-
Log training callbacks:
import tensorflow as tf import tensorflow.keras as keras from keras.layers import Activation, Dense, Dropout, Flatten from keras.models import Sequential # Define and compile model model = Sequential() model.add(Dense(10, activation="sigmoid", input_shape=((30,)))) model.add(Dropout(0.25)) model.add(Dense(20, activation="sigmoid")) model.add(Dropout(0.25)) model.add(Dense(10, activation="sigmoid")) model.add(Dropout(0.25)) model.add(Dense(1, activation="sigmoid")) model.compile( optimizer=keras.optimizers.Adam(), loss=keras.losses.mean_squared_logarithmic_error, ) # Fit model and log callbacks params = { "batch_size": 30, "epochs": 50, "verbose": 0, } callbacked = model.fit( X_train, y_train, batch_size=params["batch_size"], epochs=params["epochs"], verbose=params["verbose"], validation_data=(X_test, y_test), # log to Neptune using a Neptune callback callbacks=[NeptuneCallback(run=run)], )
-
To stop the connection to Neptune and sync all data, call the
stop()
method: -
Run your script as you normally would.
To open the run, click the Neptune link that appears in the console output.
Sample output
[neptune] [info ] Neptune initialized. Open in the app:
https://app.neptune.ai/workspace/project/e/RUN-1
A live training curve should show up in the Charts section.
More options#
Logging training and validation records to Arize#
Arize logs training and validation records to an Evaluation Store for model prelaunch validation, such as visualizing performance across different feature slices (for example, model accuracy for lower-income versus higher-income individuals).
The records you send can also serve as your model baseline, which can be compared against the features that your models use for prediction in production. This helps inform you when the distributions of the features have shifted.
To learn more about the Arize Python SDK and arize.log()
, see the Arize documentation .
Logging training records#
Use the model to generate predictions:
y_train_pred = model.predict(X_train).T[0]
y_val_pred = model.predict(X_val).T[0]
y_test_pred = model.predict(X_test).T[0]
Log the training data:
train_prediction_labels = pd.Series(y_train_pred)
train_actual_labels = pd.Series(y_train)
train_feature_df = pd.DataFrame(X_train, columns=data["feature_names"])
train_responses = arize.log(
model_id=model_id,
model_version=model_version,
model_type=model_type, # this will change depending on your model type
prediction_labels=train_prediction_labels,
actual_labels=train_actual_labels,
environment=Environments.TRAINING,
features=train_feature_df,
)
Logging validation records#
val_prediction_labels = pd.Series(y_val_pred)
val_actual_labels = pd.Series(y_val)
val_features_df = pd.DataFrame(X_val, columns=data["feature_names"])
val_responses = arize.log(
model_id=model_id,
model_version=model_version,
model_type=model_type,
batch_id="batch0",
prediction_labels=val_prediction_labels,
actual_labels=val_actual_labels,
environment=Environments.VALIDATION,
features=val_features_df,
)
Storing and versioning model weights with Neptune#
Neptune allows you to organize your model metadata in a folder-like structure inside the run. For each run, you can log model weights or checkpoints.
You can organize different trained iterations using the tag model_version
you used to log training records to Arize for better integration.
Note
The code for model storing is different for different frameworks. This example is only applicable to Keras.
To have all the metadata in a single place, you can log model metadata to the same run you created earlier.
import glob
# Storing model version 1
directory_name = f"keras_model_{model_version}"
model.save(directory_name)
run[f"{directory_name}/saved_model.pb"].upload(f"{directory_name}/saved_model.pb")
for name in glob.glob(f"{directory_name}/variables/*"):
run[name].upload(name)
# Log "model_id", for better reference
run["model_id"] = model_id
Related