Skip to content

Arize integration guide#

Arize and Neptune are MLOps tools that aim to improve connected but different parts of your ML pipeline and workflow.

  • Arize helps you:
    • visualize your production model performance
    • understand drift and data quality issues
  • Neptune logs, stores, displays, and compares your model-building metadata for better experiment tracking and model registry.

Together, Arize and Neptune help you:

  • Train the best model
  • Validate your model prelaunch
  • Compare production performances of those models

Before you start#

Installing Neptune#

Install the Neptune client library:

pip install neptune
Passing your Neptune credentials

Once you've registered and created a project, set your Neptune API token and full project name to the NEPTUNE_API_TOKEN and NEPTUNE_PROJECT environment variables, respectively.

export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM.4kl0jvYh3Kb8...6Lc"

To find your API token: In the bottom-left corner of the Neptune app, expand the user menu and select Get my API token.

export NEPTUNE_PROJECT="ml-team/classification"

Your full project name has the form workspace-name/project-name. You can copy it from the project settings: Click the menu in the top-right → Details & privacy.

On Windows, navigate to SettingsEdit the system environment variables, or enter the following in Command Prompt: setx SOME_NEPTUNE_VARIABLE 'some-value'


While it's not recommended especially for the API token, you can also pass your credentials in the code when initializing Neptune.

run = neptune.init_run(
    project="ml-team/classification",  # your full project name here
    api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jvYh...3Kb8",  # your API token here
)

For more help, see Set Neptune credentials.

If you want to replicate the example in this guide, also install the Neptune-Keras integration, tensorflow, numpy, and pandas:

pip install -U neptune-tensorflow-keras numpy pandas tensorflow

Arize logging example#

You can use callbacks to log and visualize loss curves for each training iteration.

In this example, we'll work with Keras to build a classifier model.

  1. Create a run:

    import neptune
    
    run = neptune.init_run()
    
    If Neptune can't find your project name or API token

    As a best practice, you should save your Neptune API token and project name as environment variables:

    export NEPTUNE_API_TOKEN="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8"
    
    export NEPTUNE_PROJECT="ml-team/classification"
    

    Alternatively, you can pass the information when using a function that takes api_token and project as arguments:

    run = neptune.init_run(
        api_token="h0dHBzOi8aHR0cHM6Lkc78ghs74kl0jv...Yh3Kb8", # (1)!
        project="ml-team/classification", # (2)!
    )
    
    1. In the bottom-left corner, expand the user menu and select Get my API token.
    2. You can copy the path from the project details ( Details & privacy).

    If you haven't registered, you can log anonymously to a public project:

    api_token=neptune.ANONYMOUS_API_TOKEN
    project="common/quickstarts"
    

    Make sure not to publish sensitive data through your code!

  2. Instantiate an Arize client:

    from arize.api import Client
    
    arize = Client(
        space_key=os.environ["ARIZE_SPACE_KEY"],
        api_key=os.environ["ARIZE_API_KEY"],
    )
    
  3. Define some model metadata:

    from arize.utils.types import ModelTypes, Environments
    
    model_id = "neptune_cancer_prediction_model"
    model_version = "v1"
    model_type = ModelTypes.BINARY_CLASSIFICATION
    
  4. Import and load the data:

    import concurrent.futures as cf
    import datetime
    import os
    import uuid
    
    import numpy as np
    import pandas as pd
    from sklearn import datasets, preprocessing
    from sklearn.model_selection import train_test_split
    
    
    def process_data(X, y):
        scaler = preprocessing.MinMaxScaler()
        X = np.array(X).reshape((len(X), 30))
        y = np.array(y)
        return X, y
    
    
    # Load data and split data
    data = datasets.load_breast_cancer()
    
    X, y = datasets.load_breast_cancer(return_X_y=True)
    X, y = X.astype(np.float32), y
    
    X, y = pd.DataFrame(X, columns=data["feature_names"]), pd.Series(y)
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
    X_train, X_val, y_train, y_val = train_test_split(
        X_train, y_train, random_state=42
    )
    
  5. Log training callbacks:

    import tensorflow as tf
    import tensorflow.keras as keras
    from keras.layers import Activation, Dense, Dropout, Flatten
    from keras.models import Sequential
    
    # Define and compile model
    model = Sequential()
    model.add(Dense(10, activation="sigmoid", input_shape=((30,))))
    model.add(Dropout(0.25))
    model.add(Dense(20, activation="sigmoid"))
    model.add(Dropout(0.25))
    model.add(Dense(10, activation="sigmoid"))
    model.add(Dropout(0.25))
    model.add(Dense(1, activation="sigmoid"))
    model.compile(
        optimizer=keras.optimizers.Adam(),
        loss=keras.losses.mean_squared_logarithmic_error,
    )
    
    # Fit model and log callbacks
    params = {
        "batch_size": 30,
        "epochs": 50,
        "verbose": 0,
    }
    
    callbacked = model.fit(
        X_train,
        y_train,
        batch_size=params["batch_size"],
        epochs=params["epochs"],
        verbose=params["verbose"],
        validation_data=(X_test, y_test),
        # log to Neptune using a Neptune callback
        callbacks=[NeptuneCallback(run=run)],
    )
    
  6. To stop the connection to Neptune and sync all data, call the stop() method:

    run.stop()
    
  7. Run your script as you normally would.

To open the run, click the Neptune link that appears in the console output.

Sample output

[neptune] [info ] Neptune initialized. Open in the app: https://app.neptune.ai/workspace/project/e/RUN-1

A live training curve should show up in the Charts section.

More options#

Logging training and validation records to Arize#

Arize logs training and validation records to an Evaluation Store for model prelaunch validation, such as visualizing performance across different feature slices (for example, model accuracy for lower-income versus higher-income individuals).

The records you send can also serve as your model baseline, which can be compared against the features that your models use for prediction in production. This helps inform you when the distributions of the features have shifted.

To learn more about the Arize Python SDK and arize.log(), see the Arize documentation .

Logging training records#

Use the model to generate predictions:

y_train_pred = model.predict(X_train).T[0]
y_val_pred = model.predict(X_val).T[0]
y_test_pred = model.predict(X_test).T[0]

Log the training data:

train_prediction_labels = pd.Series(y_train_pred)
train_actual_labels = pd.Series(y_train)
train_feature_df = pd.DataFrame(X_train, columns=data["feature_names"])

train_responses = arize.log(
    model_id=model_id,
    model_version=model_version,
    model_type=model_type,  # this will change depending on your model type
    prediction_labels=train_prediction_labels,
    actual_labels=train_actual_labels,
    environment=Environments.TRAINING,
    features=train_feature_df,
)

Logging validation records#

val_prediction_labels = pd.Series(y_val_pred)
val_actual_labels = pd.Series(y_val)
val_features_df = pd.DataFrame(X_val, columns=data["feature_names"])

val_responses = arize.log(
    model_id=model_id,
    model_version=model_version,
    model_type=model_type,
    batch_id="batch0",
    prediction_labels=val_prediction_labels,
    actual_labels=val_actual_labels,
    environment=Environments.VALIDATION,
    features=val_features_df,
)

Storing and versioning model weights with Neptune#

Neptune allows you to organize your model metadata in a folder-like structure inside the run. For each run, you can log model weights or checkpoints.

You can organize different trained iterations using the tag model_version you used to log training records to Arize for better integration.

Note

The code for model storing is different for different frameworks. This example is only applicable to Keras.

To have all the metadata in a single place, you can log model metadata to the same run you created earlier.

import glob

# Storing model version 1
directory_name = f"keras_model_{model_version}"
model.save(directory_name)

run[f"{directory_name}/saved_model.pb"].upload(f"{directory_name}/saved_model.pb")
for name in glob.glob(f"{directory_name}/variables/*"):
    run[name].upload(name)

# Log "model_id", for better reference
run["model_id"] = model_id