Neptune-Sklearn Integration

What will you get with this integration?

scikit-learn is an open source machine learning framework commonly used for building predictive models. Neptune helps with keeping track of model training metadata.

With Neptune + Sklearn integration you can track your classifiers, regressors and k-means clustering results, specifically:

  • log classifier and regressor parameters,

  • log pickled model,

  • log test predictions,

  • log test predictions probabilities,

  • log test scores,

  • log classifier and regressor visualizations, like confusion matrix, precision-recall chart and feature importance chart,

  • log KMeans cluster labels and clustering visualizations,

  • log metadata including git summary info.

Tip

You can log many other experiment metadata like interactive charts, video, audio and more. See the full list.

Note

This integration is tested with scikit-learn==0.23.2, neptune-client==0.4.132.

Where to start?

To get started with this integration follow the quickstart below (recommended as a first step).

You can also go to the demonstration of the functions that log regressor, classifier or K-Means summary information to Neptune. Such summary includes parameters, pickled model, visualizations and much more:

Finally if you want to log only specific information to Neptune you can make use of the convenience functions listed in the reference documentation. Below are few examples:

If you want to try things out and focus only on the code you can either:

Before you start

You have Python 3.x and following libraries installed:

pip install scikit-learn

You also need minimal familiarity with scikit-learn. Have a look at this scikit-learn guide to get started.

Quickstart

This quickstart will show you how to use Neptune with sklearn:

  • Create the first experiment in project,

  • Log estimator parameters and scores,

  • Explore results in the Neptune UI.

Step 0: Create and fit example estimator

Prepare fitted estimator that will be further used to log it’s summary. Below snippet shows the idea:

parameters = {'n_estimators': 120,
              'learning_rate': 0.12,
              'min_samples_split': 3,
              'min_samples_leaf': 2}

gbc = GradientBoostingClassifier(**parameters)

X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

gbc.fit(X_train, y_train)

Step 1: Initialize Neptune

Add the following snippet at the top of your script.

import neptune

neptune.init(api_token='ANONYMOUS', project_qualified_name='shared/sklearn-integration')

Tip

You can also use your personal API token. Read more about how to securely set the Neptune API token.

Step 2: Create an experiment and log parameters

Run the code below to create a Neptune experiment:

neptune.create_experiment(params=parameters,
                          name='sklearn-quickstart')
  • This creates a link to the experiment. Open the link in a new tab.

  • The experiment will currently be empty, but keep the window open. You will be able to see estimator summary there.

  • This is how experiment’s parameters are logged. You pass them to the create_experiment() method. You can later use them to filter and compare experiments.

When you create an experiment Neptune will look for the .git directory in your project and get the last commit information saved.

Note

If you are using .py scripts for training Neptune will also log your training script automatically.

Step 3: Log estimator scores

Log scores on the test data.

y_pred = estimator.predict(X_test)

neptune.log_metric('max_error', max_error(y_test, y_pred))
neptune.log_metric('mean_absolute_error', mean_absolute_error(y_test, y_pred))
neptune.log_metric('r2_score', r2_score(y_test, y_pred))

Here we use the log_metric() method to log scores to the experiment.

Step 4: See results in Neptune

Switch to the Neptune tab which you had opened previously to explore results.

Sklearn integration - quickstart

You can go to the reference documentation to learn more. Remember that you can try it out with zero setup:

More Options

Neptune-Scikit-learn integration also lets you log regressor, classifier or K-Means summary information to Neptune. Such summary includes parameters, pickled model, visualizations and much more:

You can choose to log only specific information to Neptune. In such case use convenience functions listed in the reference documentation. Below are few examples:

Log classification summary

You can log classification summary that includes:

Step 0: Create and fit example classifier

Prepare fitted classifier that will be further used to log it’s summary. Below snippet shows the idea:

parameters = {'n_estimators': 120,
              'learning_rate': 0.12,
              'min_samples_split': 3,
              'min_samples_leaf': 2}

gbc = GradientBoostingClassifier(**parameters)

X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

gbc.fit(X_train, y_train)

gbc object will be later used to log various metadata to the experiment.

Step 1: Initialize Neptune

Add the following snippet at the top of your script.

import neptune

neptune.init(api_token='ANONYMOUS', project_qualified_name='shared/sklearn-integration')

Tip

You can also use your personal API token. Read more about how to securely set the Neptune API token.

Step 2: Create an experiment

Run the code below to create a Neptune experiment:

neptune.create_experiment(params=parameters,
                          name='sklearn-quickstart')
  • This creates a link to the experiment. Open the link in a new tab.

  • The experiment will currently be empty, but keep the window open. You will be able to see estimator summary there.

  • This is how experiment’s parameters are logged. You pass them to the create_experiment() method. You can later use them to filter and compare experiments.

When you create an experiment Neptune will look for the .git directory in your project and get the last commit information saved.

Note

If you are using .py scripts for training Neptune will also log your training script automatically.

Step 3: Log classifier summary

Log classifier summary to Neptune, by using log_classifier_summary().

from neptunecontrib.monitoring.sklearn import log_classifier_summary

log_classifier_summary(gbc, X_train, X_test, y_train, y_test)

Step 4: See results in Neptune

Once data is logged you can switch to the Neptune tab which you had opened previously to explore results. You can check:

Sklearn integration, classification example

You can go to the reference documentation to learn more. Remember that you can try it out with zero setup:

Log regression summary

You can log regression summary that includes:

Step 0: Create and fit example regressor

Prepare fitted regressor that will be further used to log it’s summary. Below snippet shows the idea:

parameters = {'n_estimators': 70,
              'max_depth': 7,
              'min_samples_split': 3}

rfr = RandomForestRegressor(**parameters)

X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

rfr.fit(X_train, y_train)

rfr object will be later used to log various metadata to the experiment.

Step 1: Initialize Neptune

Add the following snippet at the top of your script.

import neptune

neptune.init(api_token='ANONYMOUS', project_qualified_name='shared/sklearn-integration')

Tip

You can also use your personal API token. Read more about how to securely set the Neptune API token.

Step 2: Create an experiment

Run the code below to create a Neptune experiment:

neptune.create_experiment(params=parameters,
                          name='sklearn-quickstart')
  • This creates a link to the experiment. Open the link in a new tab.

  • The experiment will currently be empty, but keep the window open. You will be able to see estimator summary there.

  • This is how experiment’s parameters are logged. You pass them to the create_experiment() method. You can later use them to filter and compare experiments.

When you create an experiment Neptune will look for the .git directory in your project and get the last commit information saved.

Note

If you are using .py scripts for training Neptune will also log your training script automatically.

Step 3: Log regressor summary

Log regressor summary to Neptune, by using log_regressor_summary().

from neptunecontrib.monitoring.sklearn import log_regressor_summary

log_regressor_summary(rfr, X_train, X_test, y_train, y_test)

Step 4: See results in Neptune

Once data is logged you can switch to the Neptune tab which you had opened previously to explore results. You can check:

Sklearn integration, regression example

You can go to the reference documentation to learn more. Remember that you can try it out with zero setup:

Log K-Means clustering summary

You can log K-Means clustering summary that includes:

Step 0: Create K-Means clustering object and example data

Prepare K-Means object and example data. These will be later used in this quickstart. Below snippet show the idea:

parameters = {'n_init': 11,
              'max_iter': 270}

km = KMeans(**parameters)

X, y = make_blobs(n_samples=579, n_features=17, centers=7, random_state=28743)

Step 1: Initialize Neptune

Add the following snippet at the top of your script.

import neptune

neptune.init(api_token='ANONYMOUS', project_qualified_name='shared/sklearn-integration')

Tip

You can also use your personal API token. Read more about how to securely set the Neptune API token.

Step 2: Create an experiment

Run the code below to create a Neptune experiment:

neptune.create_experiment(params=parameters,
                          name='clustering-example')
  • This also creates a link to the experiment. Open the link in a new tab.

  • The experiment will currently be empty, but keep the window open. You will be able to see estimator summary there.

  • This is how experiment’s parameters are logged. You pass them to the create_experiment method. You can later use them to filter and compare experiments.

When you create an experiment Neptune will look for the .git directory in your project and get the last commit information saved.

Note

If you are using .py scripts for training Neptune will also log your training script automatically.

Step 3: Log KMeans clustering summary

Log K-Means clustering summary to Neptune, by using log_kmeans_clustering_summary().

from neptunecontrib.monitoring.sklearn import log_kmeans_clustering_summary

log_kmeans_clustering_summary(km, X, n_clusters=17)

Step 4: See results in Neptune

Once data is logged you can switch to the Neptune tab which you had opened previously to explore results. You can check:

Sklearn integration, kmeans example

You can go to the reference documentation to learn more. Remember that you can try it out with zero setup:

Log estimator parameters

You can choose to only log estimator parameters.

from neptunecontrib.monitoring.sklearn import log_estimator_params

neptune.create_experiment(name='estimator-params')

log_estimator_params(my_estimator) # log estimator parameters here

This methods logs all parameters of the ‘my_estimator’ as Neptune’s properties. For example see classifier parameters.

Sklearn integration, estimator params

Log model

You can choose to log fitted model as pickle file.

from neptunecontrib.monitoring.sklearn import log_pickled_model

neptune.create_experiment(name='pickled-model')

log_pickled_model(my_estimator, 'my_model') # log pickled model parameters here.
  • This methods logs ‘my_estimator’ to Neptune’s artifacts.

  • Path to file in the Neptune artifacts is model/<my_model>. For example check this logged pickled model.

Sklearn integration, model

Log confusion matrix

You can choose to log confusion matrix chart.

from neptunecontrib.monitoring.sklearn import log_confusion_matrix_chart

neptune.create_experiment(name='confusion-matrix-chart')

log_confusion_matrix_chart(my_estimator, X_train, X_test, y_train, y_test) # log confusion matrix chart
  • This methods logs confusion matrix chart as image.

Tip

Check reference documentation for full list of available charts, including: learning curve, feature importance, ROC-AUC, precision-recall, silhouette chart and much more.

Sklearn integration, confusion matrix chart

You can go to the reference documentation to learn more. Remember that you can try it out with zero setup:

How to ask for help?

Please visit the Getting help page. Everything regarding support is there.

Other integrations you may like

You may also like these two integrations: