Scikit-learn

What will you get with this integration?

scikit-learn is an open-source machine learning framework commonly used for building predictive models. Neptune helps with keeping track of model training metadata.
With Neptune + Sklearn integration you can track your classifiers, regressors, and k-means clustering results, specifically:
    log classifier and regressor parameters,
    log pickled model,
    log test predictions,
    log test predictions probabilities,
    log test scores,
    log classifier and regressor visualizations, like confusion matrix, precision-recall chart, and feature importance chart,
    log KMeans cluster labels and clustering visualizations,
    log metadata including git summary info.
You can log many other run metadata like interactive charts, video, audio, and more. See the full list.

Where to start?

To get started with this integration follow the quickstart below (recommended as a first step).
You can also go to the demonstration of the functions that log regressor, classifier, or K-Means summary information to Neptune. Such summary includes parameters, pickled model, visualizations, and much more:
Finally, if you want to log only specific information to Neptune you can make use of the convenience functions, such as:
    get_estimator_parameters,
    get_pickled_model,
    create_prediction_error_chart.
If you want to try things out and focus only on the code you can either:

Before you start

Make sure that:

Install neptune-client, scikit-learn, and neptune-sklearn

Depending on your operating system open a terminal or CMD and run this command. All required libraries are available via pip and conda:
pip
conda
1
pip install scikit-learn neptune-client neptune-sklearn
Copied!
1
conda install -c conda-forge neptune-client scikit-learn neptune-sklearn
Copied!
For more help see installing neptune-client.
This integration has been tested with neptune-client==0.9.18, scikit-learn==0.24.1, and neptune-sklearn==0.9.5.
You also need minimal familiarity with scikit-learn. Have a look at this scikit-learn guide to get started.

Quickstart

This quickstart will show you how to use Neptune with scikit-learn:
    Create the first run in a project,
    Log estimator parameters and scores, and
    Explore results in the Neptune UI.

Step 0: Create and fit example estimator

Prepare a fitted estimator that will be further used to log its summary. Below snippet shows the idea:
1
parameters = {'n_estimators': 70,
2
'max_depth': 7,
3
'min_samples_split': 3}
4
5
estimator = RandomForestRegressor(**parameters)
6
X, y = load_boston(return_X_y=True)
7
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)
8
estimator.fit(X_train, y_train)
Copied!

Step 1: Initialize Neptune and create a new run

Add to your script (at the top):
1
import neptune.new as neptune
2
3
run = neptune.init(project='common/sklearn-integration',
4
api_token='ANONYMOUS')
Copied!
This opens a new “run” in Neptune to which you can log various objects.
You need to tell Neptune who you are and where you want to log things. To do that you specify:
    project=my_workspace/my_project: your workspace name and project name,
    api_token=YOUR_API_TOKEN : your Neptune API token.
If you configured your Neptune API token correctly, as described in this docs page, you can skip api_token argument.
If you are using.pyscripts for training Neptune will also log your training script automatically.

Step 2: Log parameters

To log parameters of your model training run you just need to pass them to the base_namespace of your choice.
1
run['parameters'] = parameters
Copied!

Step 3: Log estimator scores

Log scores on the test data under the base_namespace of your choice.
1
y_pred = estimator.predict(X_test)
2
3
run['scores/max_error'] = max_error(y_test, y_pred)
4
run['scores/mean_absolute_error'] = mean_absolute_error(y_test, y_pred)
5
run['scores/r2_score'] = r2_score(y_test, y_pred)
Copied!

Step 4: Stop logging

Once you are done logging, you should stop tracking the run using the stop() method. This is needed only while logging from a notebook environment. While logging through a script, Neptune automatically stops tracking once the script has completed execution.
1
run.stop()
Copied!

More Options

The Neptune - scikit-learn integration also lets you log regressor, classifier or K-Means summary information to Neptune. Such summary includes parameters, pickled model, visualizations and much more:
Finally, if you want to log only specific information to Neptune you can make use of the convenience functions, such as:
    get_estimator_parameters,
    get_pickled_model,
    create_prediction_error_chart.

Log classification summary

You can log a classification summary that includes:

Step 0: Create and fit example classifier

Prepare a fitted classifier that will be further used to log its summary. Below snippet shows the idea:
1
parameters = {'n_estimators': 120,
2
'learning_rate': 0.12,
3
'min_samples_split': 3,
4
'min_samples_leaf': 2}
5
6
gbc = GradientBoostingClassifier(**parameters)
7
8
X, y = load_digits(return_X_y=True)
9
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
10
11
gbc.fit(X_train, y_train)
Copied!
gbc object will be later used to log various metadata to the run.

Step 1: Initialize Neptune and create a run

Add the following snippet at the top of your script.
1
import neptune.new as neptune
2
3
run = neptune.init(project='common/sklearn-integration',
4
api_token='ANONYMOUS',
5
name='classification-example',
6
tags=['GradientBoostingClassifier', 'classification'])
Copied!
    This creates a link to the run. Open the link in a new tab.
    The run will currently be empty, but keep the window open. You will be able to see the estimator summary there.
    When you create a run, Neptune will look for the .git directory in your project and get the last commit information saved.
If you are using.pyscripts for training Neptune will also log your training script automatically.

Step 2: Log classifier summary

Log classifier summary under the base_namespace of your choice.
1
import neptune.new.integrations.sklearn as npt_utils
2
3
run['cls_summary'] = npt_utils.create_classifier_summary(gbc, X_train, X_test, y_train, y_test)
Copied!

Step 3: Stop logging

Once you are done logging, you should stop tracking the run using the stop() method. This is needed only while logging from a notebook environment. While logging through a script, Neptune automatically stops tracking once the script has completed execution.
1
run.stop()
Copied!

Step 4: See results in Neptune

Once data is logged you can switch to the Neptune tab which you had opened previously to explore results. You can check:
Remember that you can try it out with zero setup:

Log regression summary

You can log a regression summary that includes:

Step 0: Create and fit example regressor

Prepare a fitted regressor that will be further used to log its summary. The snippet below shows the idea:
1
parameters = {'n_estimators': 70,
2
'max_depth': 7,
3
'min_samples_split': 3}
4
5
rfr = RandomForestRegressor(**parameters)
6
7
X, y = load_boston(return_X_y=True)
8
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
9
10
rfr.fit(X_train, y_train)
Copied!
rfr object will be later used to log various metadata to the run.

Step 1: Initialize Neptune and create a run

Add the following snippet at the top of your script.
1
import neptune.new as neptune
2
3
run = neptune.init(project='common/sklearn-integration',
4
api_token='ANONYMOUS',
5
name='regression-example',
6
tags=['RandomForestRegressor', 'regression'])
Copied!
    This creates a link to the run. Open the link in a new tab.
    The run will currently be empty, but keep the window open. You will be able to see the estimator summary there.
    When you create a run, Neptune will look for the .git directory in your project and get the last commit information saved.
If you are using.pyscripts for training Neptune will also log your training script automatically.

Step 2: Log regressor summary

Log regressor summary under the base_namespace of your choice.
1
import neptune.new.integrations.sklearn as npt_utils
2
3
run['rfr_summary'] = npt_utils.create_regressor_summary(rfr, X_train, X_test, y_train, y_test)
Copied!

Step 3: Stop logging

Once you are done logging, you should stop tracking the run using the stop() method. This is needed only while logging from a notebook environment. While logging through a script, Neptune automatically stops tracking once the script has completed execution.
1
run.stop()
Copied!

Step 4: See results in Neptune

Once data is logged you can switch to the Neptune tab which you had opened previously to explore results. You can check:
Remember that you can try it out with zero setup:

Log K-Means clustering summary

You can log a K-Means clustering summary that includes:

Step 0: Create K-Means clustering object and example data

Prepare K-Means object and example data that will be further used to log its summary. Below snippet shows the idea:
1
parameters = {'n_init': 11,
2
'max_iter': 270}
3
4
km = KMeans(**parameters)
5
X, y = make_blobs(n_samples=579, n_features=17, centers=7, random_state=28743)
Copied!

Step 1: Initialize Neptune and create a run

Add the following snippet at the top of your script.
1
import neptune.new as neptune
2
3
run = neptune.init(project='common/sklearn-integration',
4
api_token='ANONYMOUS',
5
name='clustering-example',
6
tags=['KMeans', 'clustering'])
Copied!
    This creates a link to the run. Open the link in a new tab.
    The run will currently be empty, but keep the window open. You will be able to see the estimator summary there.
    When you create a run, Neptune will look for the .git directory in your project and get the last commit information saved.
If you are using.py scripts for training Neptune will also log your training script automatically.

Step 2: Log KMeans clustering summary

Log K-Means clustering summary under the base_namespace of your choice.
1
import neptune.new.integrations.sklearn as npt_utils
2
3
run['kmeans_summary'] = npt_utils.create_kmeans_summary(km, X, n_clusters=17)
Copied!

Step 3: Stop logging

Once you are done logging, you should stop tracking the run using the stop() method. This is needed only while logging from a notebook environment. While logging through a script, Neptune automatically stops tracking once the script has completed execution.
1
run.stop()
Copied!

Step 4: See results in Neptune

Once data is logged you can switch to the Neptune tab which you had opened previously to explore results. You can check:
Remember that you can try it out with zero setup:

Other logging options

Log estimator parameters

You can choose to only log estimator parameters.
1
import neptune.new.integrations.sklearn as npt_utils
2
3
rfc = RandomForestClassifier()
4
5
run = neptune.init(project='common/sklearn-integration',
6
api_token='ANONYMOUS',
7
name='other-options')
8
9
run['estimator/parameters'] = npt_utils.get_estimator_params(rfc)
10
11
run.stop()
Copied!

Log model

You can choose to log a fitted model as a pickled file.
1
import neptune.new.integrations.sklearn as npt_utils
2
3
rfc = RandomForestClassifier()
4
rfc.fit(X, y)
5
6
run = neptune.init(project='common/sklearn-integration',
7
api_token='ANONYMOUS',
8
name='other-options')
9
10
run['estimator/pickled-model'] = npt_utils.get_pickled_model(rfc)
11
12
run.stop()
Copied!

Log confusion matrix

You can choose to log a confusion matrix chart.
1
import neptune.new.integrations.sklearn as npt_utils
2
3
rfc = RandomForestClassifier()
4
X, y = load_digits(return_X_y=True)
5
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=28743)
6
rfc.fit(X_train, y_train)
7
8
run = neptune.init(project='common/sklearn-integration',
9
api_token='ANONYMOUS',
10
name='other-options')
11
12
run['confusion-matrix'] = npt_utils.create_confusion_matrix_chart(rfc, X_train, X_test, y_train, y_test)
13
14
run.stop()
Copied!
Remember that you can try it out with zero setup:

How to ask for help?

Please visit the Getting help page. Everything regarding support is there.

Other integrations you may like

You may also like these two integrations:
Last modified 1mo ago