Skip to content

neptune.ai introduction#

What is Neptune?#

A metadata store for MLOps, built for teams that run a lot of experiments.‌

Neptune consists of:

  • neptune-client – Python client library (API) that you use to log and query model-building metadata.
  • app.neptune.ai – web app for visualization, comparison, monitoring, and collaboration.

Neptune core concepts

You can have a workspace for each team or organization that you're working with. Within a workspace, you can create a project for each ML task you're solving.

Your project can contain metadata organized per run, model, or task.

Examples of ML metadata Neptune can track

Experiment and model-training metadata:

  • Metrics, hyperparameters, learning curves
  • Training code and configuration files
  • Predictions (images, tables)
  • Diagnostic charts (Confusion matrices, ROC curves)
  • Console and hardware logs

Artifact metadata:

  • Paths to the dataset or model (Amazon S3 bucket, filesystem)
  • Dataset hash
  • Dataset or prediction preview (head of the table, snapshot of the image folder)
  • Feature column names (for tabular data)
  • When and by whom an artifact was created or modified
  • Size and description

Trained model metadata:

  • Model binaries or location of your model assets
  • Dataset versions
  • Links to recorded model training runs and experiments
  • Who trained the model
  • Model descriptions and notes
  • Links to observability dashboards (like Grafana)

For a complete reference of what you can track, see What you can log and display.

How does it work?#

import neptune.new as neptune
from sklearn.datasets import load_wine
...

run = neptune.init_run()
data = load_wine()
X_train, X_test, y_train, y_test = train_test_split(...)

PARAMS = {"n_estimators": 10, "max_depth": 3, ...}
run["parameters"] = PARAMS

clf = RandomForestClassifier(**PARAMS)
...

test_f1_score = f1_score(y_test, y_test_pred.argmax(axis=1), average="macro")
run["test_f1"] = test_f1_score
run["model"].upload("model.pkl")

All metadata preview

Metrics of several runs visualized as charts in Neptune

Custom dashboard with multiple metadata types

Explore an example in Neptune 

With Neptune, you can log and organize ML metadata in the following ways:

You'll typically create a run every time you execute a script that does model training, re-training, or inference.

Each tracked run appears in the runs table of your project, where you can display and arrange the metadata to your liking and save views for later.

You can query and download logged metadata either through the app or API.

The model registry lets you manage the metadata and lifecycle of your models separately from your experimentation runs.

For each model, you can create and track model versions. To manage your model lifecycle, you can control the stage of each model version separately.

To learn more, see Model registry overview.

To facilitate collaboration, you can store metadata that applies to the whole project.

This way, you can store, for example, the latest validation dataset for your ML task in a dedicated place. For more, see Project metadata.

With the neptune-notebook extension, you can snapshot and compare Jupyter Notebook checkpoints in a dedicated section of the app.

For instructions, see Working with Jupyter.

Skip the manual logging by using our integrations.

You can usually create a Neptune logger or callback that you pass along in your code:

Example: Keras integration
import neptune.new as neptune
from neptune.new.integrations.tensorflow_keras import NeptuneCallback

neptune_run = neptune.init_run()

# Create a Neptune callback and pass it to model.fit()
model.fit(
    ...
    callbacks=[NeptuneCallback(run=neptune_run)],
)

The above code will take care of logging metadata typically generated during Keras training runs.

For more, see Integrations.

Learn more:

What do I need to work with Neptune?#

Neptune on-premises

You can also install Neptune on your own infrastructure. For details, see Deploying Neptune on your server.

To use the online version of Neptune, you need an internet connection from your system.

To set up logging and perform queries through the client library (API):

  • You or your team should have a working knowledge of Python.
  • You do not need extensive command-line experience, but you should know:

You can do the following without coding or technical expertise: