Neptune-Sklearn Integration¶
What will you get with this integration?¶
scikit-learn is an open source machine learning framework commonly used for building predictive models. Neptune helps with keeping track of model training metadata.
With Neptune + Sklearn integration you can track your classifiers, regressors and k-means clustering results, specifically:
log classifier and regressor parameters,
log pickled model,
log test predictions,
log test predictions probabilities,
log test scores,
log classifier and regressor visualizations, like confusion matrix, precision-recall chart and feature importance chart,
log KMeans cluster labels and clustering visualizations,
log metadata including git summary info.
Tip
You can log many other experiment metadata like interactive charts, video, audio and more. See the full list.
Note
This integration is tested with scikit-learn==0.23.2
, neptune-client==0.4.132
.
Where to start?¶
To get started with this integration follow the quickstart below (recommended as a first step).
You can also go to the demonstration of the functions that log regressor, classifier or K-Means summary information to Neptune. Such summary includes parameters, pickled model, visualizations and much more:
Finally if you want to log only specific information to Neptune you can make use of the convenience functions listed in the reference documentation. Below are few examples:
If you want to try things out and focus only on the code you can either:
Before you start¶
You have Python 3.x
and following libraries installed:
neptune-client
. See neptune-client installation guide.scikit-learn
. See scikit-learn installation guide.
pip install scikit-learn
You also need minimal familiarity with scikit-learn. Have a look at this scikit-learn guide to get started.
Quickstart¶
This quickstart will show you how to use Neptune with sklearn:
Create the first experiment in project,
Log estimator parameters and scores,
Explore results in the Neptune UI.
Step 0: Create and fit example estimator¶
Prepare fitted estimator that will be further used to log it’s summary. Below snippet shows the idea:
parameters = {'n_estimators': 120,
'learning_rate': 0.12,
'min_samples_split': 3,
'min_samples_leaf': 2}
gbc = GradientBoostingClassifier(**parameters)
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
gbc.fit(X_train, y_train)
Step 1: Initialize Neptune¶
Add the following snippet at the top of your script.
import neptune
neptune.init(api_token='ANONYMOUS', project_qualified_name='shared/sklearn-integration')
Tip
You can also use your personal API token. Read more about how to securely set the Neptune API token.
Step 2: Create an experiment and log parameters¶
Run the code below to create a Neptune experiment:
neptune.create_experiment(params=parameters,
name='sklearn-quickstart')
This creates a link to the experiment. Open the link in a new tab.
The experiment will currently be empty, but keep the window open. You will be able to see estimator summary there.
This is how experiment’s parameters are logged. You pass them to the
create_experiment()
method. You can later use them to filter and compare experiments.
When you create an experiment Neptune will look for the .git
directory in your project and get the last commit information saved.
Note
If you are using .py
scripts for training Neptune will also log your training script automatically.
Step 3: Log estimator scores¶
Log scores on the test data.
y_pred = estimator.predict(X_test)
neptune.log_metric('max_error', max_error(y_test, y_pred))
neptune.log_metric('mean_absolute_error', mean_absolute_error(y_test, y_pred))
neptune.log_metric('r2_score', r2_score(y_test, y_pred))
Here we use the log_metric()
method to log scores to the experiment.
Step 4: See results in Neptune¶
Switch to the Neptune tab which you had opened previously to explore results.

You can go to the reference documentation to learn more. Remember that you can try it out with zero setup:
More Options¶
Neptune-Scikit-learn integration also lets you log regressor, classifier or K-Means summary information to Neptune. Such summary includes parameters, pickled model, visualizations and much more:
You can choose to log only specific information to Neptune. In such case use convenience functions listed in the reference documentation. Below are few examples:
Log classification summary¶
You can log classification summary that includes:
classifier parameters logged at the experiment creation,
logged classifier visualizations - look for “charts_sklearn”,
logged metadata including git summary info.
Step 0: Create and fit example classifier¶
Prepare fitted classifier that will be further used to log it’s summary. Below snippet shows the idea:
parameters = {'n_estimators': 120,
'learning_rate': 0.12,
'min_samples_split': 3,
'min_samples_leaf': 2}
gbc = GradientBoostingClassifier(**parameters)
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
gbc.fit(X_train, y_train)
gbc
object will be later used to log various metadata to the experiment.
Step 1: Initialize Neptune¶
Add the following snippet at the top of your script.
import neptune
neptune.init(api_token='ANONYMOUS', project_qualified_name='shared/sklearn-integration')
Tip
You can also use your personal API token. Read more about how to securely set the Neptune API token.
Step 2: Create an experiment¶
Run the code below to create a Neptune experiment:
neptune.create_experiment(params=parameters,
name='sklearn-quickstart')
This creates a link to the experiment. Open the link in a new tab.
The experiment will currently be empty, but keep the window open. You will be able to see estimator summary there.
This is how experiment’s parameters are logged. You pass them to the
create_experiment()
method. You can later use them to filter and compare experiments.
When you create an experiment Neptune will look for the .git
directory in your project and get the last commit information saved.
Note
If you are using .py
scripts for training Neptune will also log your training script automatically.
Step 3: Log classifier summary¶
Log classifier summary to Neptune, by using log_classifier_summary()
.
from neptunecontrib.monitoring.sklearn import log_classifier_summary
log_classifier_summary(gbc, X_train, X_test, y_train, y_test)
Step 4: See results in Neptune¶
Once data is logged you can switch to the Neptune tab which you had opened previously to explore results. You can check:
classifier parameters logged at the experiment creation,
logged classifier visualizations - look for “charts_sklearn”,
logged metadata including git summary info.

You can go to the reference documentation to learn more. Remember that you can try it out with zero setup:
Log regression summary¶
You can log regression summary that includes:
regressor parameters logged at the experiment creation,
all regressor parameters as properties,
logged regressor visualizations - look for “charts_sklearn”,
logged metadata including git summary info.
Step 0: Create and fit example regressor¶
Prepare fitted regressor that will be further used to log it’s summary. Below snippet shows the idea:
parameters = {'n_estimators': 70,
'max_depth': 7,
'min_samples_split': 3}
rfr = RandomForestRegressor(**parameters)
X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
rfr.fit(X_train, y_train)
rfr
object will be later used to log various metadata to the experiment.
Step 1: Initialize Neptune¶
Add the following snippet at the top of your script.
import neptune
neptune.init(api_token='ANONYMOUS', project_qualified_name='shared/sklearn-integration')
Tip
You can also use your personal API token. Read more about how to securely set the Neptune API token.
Step 2: Create an experiment¶
Run the code below to create a Neptune experiment:
neptune.create_experiment(params=parameters,
name='sklearn-quickstart')
This creates a link to the experiment. Open the link in a new tab.
The experiment will currently be empty, but keep the window open. You will be able to see estimator summary there.
This is how experiment’s parameters are logged. You pass them to the
create_experiment()
method. You can later use them to filter and compare experiments.
When you create an experiment Neptune will look for the .git
directory in your project and get the last commit information saved.
Note
If you are using .py
scripts for training Neptune will also log your training script automatically.
Step 3: Log regressor summary¶
Log regressor summary to Neptune, by using log_regressor_summary()
.
from neptunecontrib.monitoring.sklearn import log_regressor_summary
log_regressor_summary(rfr, X_train, X_test, y_train, y_test)
Step 4: See results in Neptune¶
Once data is logged you can switch to the Neptune tab which you had opened previously to explore results. You can check:
regressor parameters logged at the experiment creation,
all regressor parameters as properties,
logged regressor visualizations - look for “charts_sklearn”,
logged metadata including git summary info.

You can go to the reference documentation to learn more. Remember that you can try it out with zero setup:
Log K-Means clustering summary¶
You can log K-Means clustering summary that includes:
KMeans parameters logged at the experiment creation,
all KMeans parameters as properties,
logged metadata including git summary info.
Step 0: Create K-Means clustering object and example data¶
Prepare K-Means object and example data. These will be later used in this quickstart. Below snippet show the idea:
parameters = {'n_init': 11,
'max_iter': 270}
km = KMeans(**parameters)
X, y = make_blobs(n_samples=579, n_features=17, centers=7, random_state=28743)
Step 1: Initialize Neptune¶
Add the following snippet at the top of your script.
import neptune
neptune.init(api_token='ANONYMOUS', project_qualified_name='shared/sklearn-integration')
Tip
You can also use your personal API token. Read more about how to securely set the Neptune API token.
Step 2: Create an experiment¶
Run the code below to create a Neptune experiment:
neptune.create_experiment(params=parameters,
name='clustering-example')
This also creates a link to the experiment. Open the link in a new tab.
The experiment will currently be empty, but keep the window open. You will be able to see estimator summary there.
This is how experiment’s parameters are logged. You pass them to the create_experiment method. You can later use them to filter and compare experiments.
When you create an experiment Neptune will look for the .git
directory in your project and get the last commit information saved.
Note
If you are using .py
scripts for training Neptune will also log your training script automatically.
Step 3: Log KMeans clustering summary¶
Log K-Means clustering summary to Neptune, by using log_kmeans_clustering_summary()
.
from neptunecontrib.monitoring.sklearn import log_kmeans_clustering_summary
log_kmeans_clustering_summary(km, X, n_clusters=17)
Step 4: See results in Neptune¶
Once data is logged you can switch to the Neptune tab which you had opened previously to explore results. You can check:
KMeans parameters logged at the experiment creation,
all KMeans parameters as properties,
logged metadata including git summary info.

You can go to the reference documentation to learn more. Remember that you can try it out with zero setup:
Log estimator parameters¶
You can choose to only log estimator parameters.
from neptunecontrib.monitoring.sklearn import log_estimator_params
neptune.create_experiment(name='estimator-params')
log_estimator_params(my_estimator) # log estimator parameters here
This methods logs all parameters of the ‘my_estimator’ as Neptune’s properties. For example see classifier parameters.

Log model¶
You can choose to log fitted model as pickle file.
from neptunecontrib.monitoring.sklearn import log_pickled_model
neptune.create_experiment(name='pickled-model')
log_pickled_model(my_estimator, 'my_model') # log pickled model parameters here.
This methods logs ‘my_estimator’ to Neptune’s artifacts.
Path to file in the Neptune artifacts is
model/<my_model>
. For example check this logged pickled model.

Log confusion matrix¶
You can choose to log confusion matrix chart.
from neptunecontrib.monitoring.sklearn import log_confusion_matrix_chart
neptune.create_experiment(name='confusion-matrix-chart')
log_confusion_matrix_chart(my_estimator, X_train, X_test, y_train, y_test) # log confusion matrix chart
This methods logs confusion matrix chart as image.
Tip
Check reference documentation for full list of available charts, including: learning curve, feature importance, ROC-AUC, precision-recall, silhouette chart and much more.

You can go to the reference documentation to learn more. Remember that you can try it out with zero setup:
How to ask for help?¶
Please visit the Getting help page. Everything regarding support is there.