Arize integration guide#
Arize and Neptune are MLOps tools that aim to improve connected but different parts of your ML pipeline and workflow.
- Arize helps you:
- visualize your production model performance
- understand drift and data quality issues
- Neptune logs, stores, displays, and compares your model-building metadata for better experiment tracking and model registry.
Together, Arize and Neptune help you:
- Train the best model
- Validate your model pre-launch
- Compare production performances of those models
Before you start#
- Sign up at neptune.ai/register.
- Create a project for storing your metadata.
- Have Arize installed.
Installing Neptune#
Install the Neptune client library:
Installing through Anaconda Navigator
To find neptune, you may need to update your channels and index.
- In the Navigator, select Environments.
- In the package view, click Channels.
- Click Add..., enter
conda-forge
, and click Update channels. - In the package view, click Update index... and wait until the update is complete. This can take several minutes.
- You should now be able to search for neptune.
Note: The displayed version may be outdated. The latest version of the package will be installed.
Note: On Bioconda, there is a "neptune" package available which is not the neptune.ai client library. Make sure to specify the "conda-forge" channel when installing neptune.ai.
Passing your Neptune credentials
Once you've registered and created a project, set your Neptune API token and full project name to the NEPTUNE_API_TOKEN
and NEPTUNE_PROJECT
environment variables, respectively.
To find your API token: In the bottom-left corner of the Neptune app, expand the user menu and select Get my API token.
To find your project: Your full project name has the form workspace-name/project-name
. To copy the name, click the menu in the top-right corner and select Edit project details.
While it's not recommended especially for the API token, you can also pass your credentials in the code when initializing Neptune.
run = neptune.init_run(
project="ml-team/classification", # your full project name here
api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh...3Kb8", # your API token here
)
For more help, see Set Neptune credentials.
If you want to replicate the example in this guide, also install the Neptune–Keras integration, tensorflow, numpy, and pandas:
Arize logging example#
You can use callbacks to log and visualize loss curves for each training iteration.
In this example, we'll work with Keras to build a classifier model.
-
Create a run:
If Neptune can't find your project name or API token
As a best practice, you should save your Neptune API token and project name as environment variables:
export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh3Kb8" export NEPTUNE_PROJECT="ml-team/classification"
You can, however, also pass them as arguments when initializing Neptune:
run = neptune.init_run( api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh3Kb8", # your token here project="ml-team/classification", # your full project name here )
- API token: In the bottom-left corner, expand the user menu and select Get my API token.
- Project name: in the top-right menu: → Edit project details.
If you haven't registered, you can also log anonymously to a public project (make sure not to publish sensitive data through your code!):
-
Instantiate an Arize client:
-
Define some model metadata:
-
Import and load the data:
import concurrent.futures as cf import datetime import os import uuid import numpy as np import pandas as pd from sklearn import datasets, preprocessing from sklearn.model_selection import train_test_split def process_data(X, y): scaler = preprocessing.MinMaxScaler() X = np.array(X).reshape((len(X), 30)) y = np.array(y) return X, y # Load data and split data data = datasets.load_breast_cancer() X, y = datasets.load_breast_cancer(return_X_y=True) X, y = X.astype(np.float32), y X, y = pd.DataFrame(X, columns=data["feature_names"]), pd.Series(y) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) X_train, X_val, y_train, y_val = train_test_split( X_train, y_train, random_state=42 )
-
Log training callbacks:
import tensorflow as tf import tensorflow.keras as keras from keras.layers import Activation, Dense, Dropout, Flatten from keras.models import Sequential # Define and compile model model = Sequential() model.add(Dense(10, activation="sigmoid", input_shape=((30,)))) model.add(Dropout(0.25)) model.add(Dense(20, activation="sigmoid")) model.add(Dropout(0.25)) model.add(Dense(10, activation="sigmoid")) model.add(Dropout(0.25)) model.add(Dense(1, activation="sigmoid")) model.compile( optimizer=keras.optimizers.Adam(), loss=keras.losses.mean_squared_logarithmic_error, ) # Fit model and log callbacks params = { "batch_size": 30, "epochs": 50, "verbose": 0, } callbacked = model.fit( X_train, y_train, batch_size=params["batch_size"], epochs=params["epochs"], verbose=params["verbose"], validation_data=(X_test, y_test), # log to Neptune using a Neptune callback callbacks=[NeptuneCallback(run=run)], )
-
To stop the connection to Neptune and sync all data, call the
stop()
method: -
Run your script as you normally would.
To open the run, click the Neptune link that appears in the console output.
A live training curve should show up in the Charts section.
More options#
Logging training and validation records to Arize#
Arize logs training and validation records to an Evaluation Store for model pre-launch validation, such as visualizing performance across different feature slices (for example, model accuracy for lower-income versus higher-income individuals).
The records you send can also serve as your model baseline, which can be compared against the features that your models use for prediction in production. This helps inform you when the distributions of the features have shifted.
To learn more about the Arize Python SDK and arize.log()
, see the Arize documentation .
Logging training records#
Use the model to generate predictions:
y_train_pred = model.predict(X_train).T[0]
y_val_pred = model.predict(X_val).T[0]
y_test_pred = model.predict(X_test).T[0]
Log the training data:
train_prediction_labels = pd.Series(y_train_pred)
train_actual_labels = pd.Series(y_train)
train_feature_df = pd.DataFrame(X_train, columns=data["feature_names"])
train_responses = arize.log(
model_id=model_id,
model_version=model_version,
model_type=model_type, # this will change depending on your model type
prediction_labels=train_prediction_labels,
actual_labels=train_actual_labels,
environment=Environments.TRAINING,
features=train_feature_df,
)
Logging validation records#
val_prediction_labels = pd.Series(y_val_pred)
val_actual_labels = pd.Series(y_val)
val_features_df = pd.DataFrame(X_val, columns=data["feature_names"])
val_responses = arize.log(
model_id=model_id,
model_version=model_version,
model_type=model_type,
batch_id="batch0",
prediction_labels=val_prediction_labels,
actual_labels=val_actual_labels,
environment=Environments.VALIDATION,
features=val_features_df,
)
Storing and versioning model weights with Neptune#
Neptune allows you to organize your model metadata in a folder-like structure inside the run. For each run, you can log model weights or checkpoints.
You can organize different trained iterations using the tag model_version
you used to log training records to Arize for better integration.
Note
The code for model storing is different for different frameworks. This example is only applicable to Keras.
- To have all the metadata in a single place, you can log model metadata to the same run you created earlier.
- To manage your model metadata separately, you can use the Neptune model registry.
Logging to the run#
import glob
# Storing model version 1
directory_name = f"keras_model_{model_version}"
model.save(directory_name)
run[f"{directory_name}/saved_model.pb"].upload(f"{directory_name}/saved_model.pb")
for name in glob.glob(f"{directory_name}/variables/*"):
run[name].upload(name)
# Log "model_id", for better reference
run["model_id"] = model_id
Using the model registry#
You first need to create a Model
object. It can optionally contain metadata that is common to all versions.
model = neptune.init_model(key="PRETRAINED")
model["some_namespace"] = your_metadata
Then initialize a ModelVersion
object and log the metadata there, just like you would with a run. You can create and manage each model version separately.
model_version = neptune.init_model_version(model="CLS-PRETRAINED") # (1)!
- The full model ID includes the project key.
model_version[f"{directory_name}/saved_model.pb"].upload(
f"{directory_name}/saved_model.pb"
)
for name in glob.glob(f"{directory_name}/variables/*"):
model_version[name].upload(name)
The model metadata will now be displayed in the Models section of your Neptune project.