Skip to content

API reference: scikit-learn integration#

You can use the Neptune integration with scikit-learn to track your classifiers, regressors, and k-means clustering results.


create_regressor_summary()#

Returns a scikit-learn regressor summary that includes:

  • All regressor parameters
  • Pickled estimator (model)
  • Test predictions
  • Test scores
  • Model performance visualizations

The regressor should be fitted before calling this function.

Parameters

Name     Type Default Description
regressor regressor - Fitted scikit-learn regressor object.
X_train ndarray - Training data matrix.
X_test ndarray - Testing data matrix.
y_train ndarray - The regression target for training.
y_test ndarray - The regression target for testing.
nrows int, optional 1000 Log first nrows rows of test predictions.
log_charts bool, optional True If True, calculate and log chart visualizations.

This is equivalent to calling the create_learning_curve_chart() create_feature_importance_chart(), create_residuals_chart(), create_prediction_error_chart(), and create_cooks_distance_chart() functions from this module.

Note: Calculating visualizations is potentially expensive depending on input data and regressor, and may take some time to finish.

Returns

dict with all metadata, which can be assigned to a run namespace:

run["summary"] = create_regressor_summary(...)

Example

# Create a run
import neptune
run = neptune.init_run()

# Log random forest regressor summary
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils
run["random_forest/summary"] = npt_utils.create_regressor_summary(
    rfr, X_train, X_test, y_train, y_test
)

create_classifier_summary()#

Returns a scikit-learn classifier summary that includes:

  • All classifier parameters
  • Pickled estimator (model)
  • Test predictions
  • Test predictions probabilities
  • Test scores
  • Model performance visualizations

The classifier should be fitted before calling this function.

Parameters

Name     Type Default Description
classifier classifier - Fitted scikit-learn classifier object.
X_train ndarray - Training data matrix.
X_test ndarray - Testing data matrix.
y_train ndarray - The classification target for training.
y_test ndarray - The classification target for testing.
nrows int, optional 1000 Log first nrows rows of test predictions and prediction probabilities.
log_charts bool, optional True If True, calculate and log chart visualizations.

This is equivalent to calling the create_classification_report_chart() create_confusion_matrix_chart(), create_roc_auc_chart(), create_prediction_error_chart(), create_precision_recall_chart() and create_class_prediction_error_chart() functions from this module.

Note: Calculating visualizations is potentially expensive depending on input data and regressor, and may take some time to finish.

Returns

dict with all metadata, which can be assigned to the run namespace:

run["summary"] = create_classifier_summary(...)

Example

# Create a run
import neptune

run = neptune.init_run()

# Log random forest classifier summary
rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils

run["random_forest/summary"] = npt_utils.create_classifier_summary(
    rfc, X_train, X_test, y_train, y_test
)

create_kmeans_summary()#

Returns a scikit-learn k-means summary.

This method fits the k-means model to data and logs:

  • All KMeans parameters
  • Cluster labels
  • Clustering visualizations: k-means elbow chart and silhouette coefficients chart

Parameters

Name Type Default Description
model KMeans - KMeans object
X ndarray - Training instances to cluster
nrows int, optional 1000 Number of rows to log in the cluster labels
kwargs - - KMeans parameters

Returns

dict with all metadata, which can be assigned to a run namespace: run["summary"] = create_kmeans_summary(...)

Example

# Create a run
import neptune
run = neptune.init_run()

# Log random forest classifier summary
km = KMeans(n_init=11, max_iter=270)
X, y = make_blobs(n_samples=579, n_features=17, centers=7, random_state=28743)

import neptune.integrations.sklearn as npt_utils
run["kmeans/summary"] = npt_utils.create_kmeans_summary(km, X)

get_estimator_params()#

Get estimator parameters.

Parameters

Name Type Description
estimator estimator Scikit-learn estimator to log parameters for.

Returns

dict with all parameters mapped to their values.

Example

# Create a run
import neptune
run = neptune.init_run()

# Log estimator parameters
rfr = RandomForestRegressor()

import neptune.integrations.sklearn as npt_utils
from neptune.utils import stringify_unsupported

run["estimator/params"] = stringify_unsupported(npt_utils.get_estimator_params(rfr))

get_pickled_model()#

Get pickled estimator.

Parameters

Name Type Description
estimator estimator Scikit-learn estimator to pickle.

Returns

File value object with a pickled model that you can log to the run.

Example

# Create a run
import neptune
run = neptune.init_run()

# Log pickled model
rfr = RandomForestRegressor()

import neptune.integrations.sklearn as npt_utils
run["estimator/pickled_model"] = npt_utils.get_pickled_model(rfr)

get_test_preds()#

Get test predictions as a table.

If you pass y_pred, predictions are not computed from X_test data.

The estimator should be fitted before calling this function.

Parameters

Name Type Default Description
estimator estimator - scikit-learn estimator to compute predictions.
X_test ndarray - Testing data matrix.
y_test ndarray - The regression target for testing.
y_pred ndarray, optional None Estimator predictions on test data.
nrows int, optional 1000 Number of rows to log.

Returns

File value object with test predictions as a table that you can log to the run.

Example

# Create a run
import neptune
run = neptune.init_run()

# Log test predictions as a table
rfr = RandomForestRegressor()

import neptune.integrations.sklearn as npt_utils
run["estimator/test_preds"] = npt_utils.get_test_preds(rfr, X_test, y_test)

get_test_preds_proba()#

Get test prediction probabilities.

  • If you pass X_test, prediction probabilities are computed from data.
  • If you pass y_pred_proba, prediction probabilities are not computed from X_test data.

The estimator should be fitted before calling this function.

Parameters

Name Type Default Description
classifier classifier - scikit-learn classifier to compute prediction probabilities.
X_test ndarray - Testing data matrix.
y_pred_proba ndarray, optional None Classifier prediction probabilities on test data.
nrows int, optional 1000 Number of rows to log.

Returns

File value object with test prediction probabilities as a table that you can log to the run.

Example

# Create a run
import neptune
run = neptune.init_run()

# Log classifier test predictions probabilities
rfr = RandomForestRegressor()

import neptune.integrations.sklearn as npt_utils
run["estimator/test_preds_proba"] = npt_utils.get_test_preds_proba(rfr, X_test)

get_scores()#

Get estimator scores on X.

  • If you pass y_pred, predictions are not computed from X and y data.

The estimator should be fitted before calling this function.

Estimator Logged scores
Single output regressors Explained variance, max error, mean absolute error, \(r^2\)
Multi output regressors \(r^2\)
Classifiers Precision, recall, f beta score, support

Parameters

Name Type Default Description
estimator estimator - scikit-learn estimator to compute scores.
X ndarray - Data matrix.
y ndarray - Target for testing.
y_pred ndarray, optional None Estimator predictions on data.

Returns

dict with scores.

Example

# Create a run
import neptune
run = neptune.init_run()

# Log estimator scores
rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils
run["estimator/scores"] = npt_utils.get_scores(rfc, X, y)

create_learning_curve_chart()#

Returns a learning curve chart.

Parameters

Name Type Default Description
regressor regressor - Fitted scikit-learn regressor object
X_train ndarray - Training data matrix
y_train ndarray - The regression target for training

Returns

File value object with a learning curve chart that you can log to the run.

Example

# Create a run
import neptune
run = neptune.init_run()

# Log a learning curve chart
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils
run["visuals/learning_curve"] = npt_utils.create_learning_curve_chart(
    rfr, X_train, y_train
)

create_feature_importance_chart()#

Returns a feature importance chart.

Parameters

Name Type Default Description
regressor regressor - Fitted scikit-learn regressor object
X_train ndarray - Training data matrix
y_train ndarray - The regression target for training

Returns

File value object with a feature importance chart that you can log to the run.

Example

# Create a run
import neptune
run = neptune.init_run()

# Log a feature importance chart
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils
run["visuals/feature_importance"] = npt_utils.create_feature_importance_chart(
    rfr, X_train, y_train
)

create_residuals_chart()#

Returns a residuals chart.

Parameters

Name Type Default Description
regressor regressor - Fitted scikit-learn regressor object
X_train ndarray - Training data matrix
X_test ndarray - Testing data matrix
y_train ndarray - The regression target for training
y_test ndarray - The regression target for testing

Returns

File value object with a residuals chart that you can log to the run.

Example

# Create a run
import neptune
run = neptune.init_run()

# Log a residuals chart
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils
run["visuals/residuals"] = npt_utils.create_residuals_chart(
    rfr, X_train, X_test, y_train, y_test
)

create_prediction_error_chart()#

Returns a prediction error chart.

Parameters

Name Type Default Description
regressor regressor - Fitted scikit-learn regressor object
X_train ndarray - Training data matrix
X_test ndarray - Testing data matrix
y_train ndarray - The regression target for training
y_test ndarray - The regression target for testing

Returns

File value object with a prediction error chart that you can log to the run.

Example

# Create a run
import neptune
run = neptune.init_run()

# Log a prediction error chart
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils
run["visuals/prediction_error"] = npt_utils.create_prediction_error_chart(
    rfr, X_train, X_test, y_train, y_test
)

create_cooks_distance_chart()#

Returns a Cook's distance chart.

Parameters

Name Type Default Description
regressor regressor - Fitted scikit-learn regressor object
X_train ndarray - Training data matrix
y_train ndarray - The regression target for training

Returns

File value object with a Cook's distance chart that you can log to the run.

Example

# Create a run
import neptune
run = neptune.init_run()

# Log a prediction error chart
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils
run["visuals/cooks_distance"] = npt_utils.create_cooks_distance_chart(
    rfr, X_train, y_train
)

create_classification_report_chart()#

Returns a classification report chart.

Parameters

Name Type Default Description
classifier classifier - Fitted scikit-learn classifier object
X_train ndarray - Training data matrix
X_test ndarray - Testing data matrix
y_train ndarray - The classification target for training
y_test ndarray - The classification target for testing

Returns

File value object with a classification report chart that you can log to the run.

Example

# Create a run
import neptune
run = neptune.init_run()

# Log a classification report chart
rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils
run["visuals/cls_report"] = npt_utils.create_classification_report_chart(
    rfc, X_train, X_test, y_train, y_test
)

create_confusion_matrix_chart()#

Returns a confusion matrix.

Parameters

Name Type Default Description
classifier classifier - Fitted scikit-learn classifier object.
X_train ndarray - Training data matrix.
X_test ndarray - Testing data matrix.
y_train ndarray - The classification target for training.
y_test ndarray - The classification target for testing.

Returns

File value object that you can log to the run.

Example

Create a run:

import neptune

run = neptune.init_run()

Log the chart:

rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils

run["visuals/confusion_matrix"] = npt_utils.create_confusion_matrix_chart(
    rfc, X_train, X_test, y_train, y_test
)

create_roc_auc_chart()#

Returns a ROC-AUC chart.

Parameters

Name Type Default Description
classifier classifier - Fitted scikit-learn classifier object.
X_train ndarray - Training data matrix.
X_test ndarray - Testing data matrix.
y_train ndarray - The classification target for training.
y_test ndarray - The classification target for testing.

Returns

File value object that you can log to the run.

Example

Create a run:

import neptune

run = neptune.init_run()

Log the chart:

rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils

run["visuals/roc_auc"] = npt_utils.create_roc_auc_chart(
    rfc, X_train, X_test, y_train, y_test
)

create_precision_recall_chart()#

Returns a precision-recall chart.

Parameters

Name Type Default Description
classifier classifier - Fitted scikit-learn classifier object.
X_test ndarray - Testing data matrix.
y_test ndarray - The classification target for testing.
y_pred_proba ndarray - Classifier predictions probabilities on test data.

Returns

File value object that you can log to the run.

Example

Create a run:

import neptune

run = neptune.init_run()

Log the chart:

rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils

run["visuals/precision_recall"] = npt_utils.create_precision_recall_chart(
    rfc, X_test, y_test
)

create_class_prediction_error_chart()#

Returns a class prediction error chart.

Parameters

Name Type Default Description
classifier classifier - Fitted scikit-learn classifier object.
X_train ndarray - Training data matrix.
X_test ndarray - Testing data matrix.
y_train ndarray - The classification target for training.
y_test ndarray - The classification target for testing.

Returns

File value object that you can log to the run.

Example

Create a run:

import neptune

run = neptune.init_run()

Log the chart:

rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)

import neptune.integrations.sklearn as npt_utils

run["visuals/class_pred_error"] = npt_utils.create_class_prediction_error_chart(
    rfc, X_train, X_test, y_train, y_test
)

get_cluster_labels()#

Logs the index of the cluster label each sample belongs to.

Parameters

Name Type Default Description
model KMeans - KMeans object.
X ndarray - Training instances to cluster.
nrows int, optional 1000 Number of rows to log.
kwargs - - KMeans parameters.

Returns

File value object that you can log to the run.

Example

Create a run:

import neptune

run = neptune.init_run()

Log the labels:

km = KMeans(n_init=11, max_iter=270)
X, y = make_blobs(n_samples=579, n_features=17, centers=7, random_state=28743)

import neptune.integrations.sklearn as npt_utils

run["kmeans/cluster_labels"] = npt_utils.get_cluster_labels(km, X)

create_kelbow_chart()#

Returns the K-elbow chart for the KMeans clusterer.

Parameters

Name Type Default Description
model KMeans - KMeans object.
X ndarray - Training instances to cluster.
kwargs - - KMeans parameters.

Returns

File value object that you can log to the run.

Example

Create a run:

import neptune

run = neptune.init_run()

Log the chart:

km = KMeans(n_init=11, max_iter=270)
X, y = make_blobs(n_samples=579, n_features=17, centers=7, random_state=28743)

import neptune.integrations.sklearn as npt_utils

run["kmeans/kelbow"] = npt_utils.create_kelbow_chart(km, X)

create_silhouette_chart()#

Returns the silhouette coefficient charts for the KMeans clusterer.

Charts are computed for j = 2, 3, ..., n_clusters.

Parameters

Name Type Default Description
model KMeans - KMeans object.
X ndarray - Training instances to cluster.
kwargs - - KMeans parameters.

Returns

File value object that you can log to the run.

Example

Create a run:

import neptune

run = neptune.init_run()

Log the charts:

km = KMeans(n_init=11, max_iter=270)
X, y = make_blobs(n_samples=579, n_features=17, centers=7, random_state=28743)

import neptune.integrations.sklearn as npt_utils

run["kmeans/silhouette"] = npt_utils.create_silhouette_chart(km, X, n_clusters=12)

See also

neptune-sklearn on GitHub