Scikit-Learn

You can use Neptune integration with Scikit-Learn to track your classifiers, regressors, and k-means clustering results.

You can find detailed information on how to install and use the integration in the user guide.

.create_regressor_summary()

Create a scikit-learn regressor summary.

This method creates a regressor summary that includes:

  • all regressor parameters,

  • pickled estimator (model),

  • test predictions,

  • test scores,

  • model performance visualizations.

Regressor should be fitted before calling this function.

Parameters

regressor

(regressor) - Fitted scikit-learn regressor object

X_train

(ndarray) - Training data matrix

X_test

(ndarray) - Testing data matrix

y_train

(ndarray) - The regression target for training

y_test

(ndarray) -The regression target for testing

nrows

(int, optional, default is 1000) - Log first nrows rows of test predictions.

log_charts

(bool, optional, default is True) - If True, calculate and log chart visualizations.

This is equivalent to calling log_learning_curve_chart, log_feature_importance_chart, log_residuals_chart, log_prediction_error_chart, log_cooks_distance_chart functions from this module.

Note: Calculating visualizations is potentially expensive depending on input data and regressor, and | may take some time to finish.

Returns

dict with all metadata, that can be assigned to the run namespace. run["summary"] = create_regressor_summary(...)

Examples

# Create run
import neptune.new as neptune
run = neptune.init(project="WORKSPACE/PROJECT")
# Log random forest regressor summary
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)
import neptune.new.integrations.sklearn as npt_utils
run["random_forest/summary"] = npt_utils.create_regressor_summary(rfr, X_train, X_test, y_train, y_test)

.create_classifier_summary()

Create a scikit-learn classifier summary.

This method creates a classifier summary that includes:

  • all classifier parameters,

  • pickled estimator (model),

  • test predictions,

  • test predictions probabilities,

  • test scores,

  • model performance visualizations.

The classifier should be fitted before calling this function.

Parameters

classifier

(classifier) - Fitted scikit-learn classifier object

X_train

(ndarray) - Training data matrix

X_test

(ndarray) - Testing data matrix

y_train

(ndarray) - The classification target for training

y_test

(ndarray) -The classification target for testing

nrows

(int, optional, default is 1000) - Log first nrows rows of test predictions and predictions probabilities.

log_charts

(bool, optional, default is True) - If True, calculate and log chart visualizations.

This is equivalent to calling log_classification_report_chart, log_confusion_matrix_chart, log_roc_auc_chart, log_precision_recall_chart, log_class_prediction_error_chart functions from this module.

Note: Calculating visualizations is potentially expensive depending on input data and regressor, and | may take some time to finish.

Returns

dict with all metadata, that can be assigned to the run namespace. run["summary"] = create_classifier_summary(...)

Examples

# Create run
import neptune.new as neptune
run = neptune.init(project="WORKSPACE/PROJECT")
# Log random forest classifier summary
rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)
import neptune.new.integrations.sklearn as npt_utils
run["random_forest/summary"] = npt_utils.create_classifier_summary(rfc, X_train, X_test, y_train, y_test)

.create_kmeans_summary()

Create scikit-learn K-Means summary.

This method fits KMeans model to data and logs:

  • all kmeans parameters,

  • cluster labels,

  • clustering visualizations: KMeans elbow chart and silhouette coefficients chart.

Parameters

model

(KMeans) - KMeans object

X

(ndarray) - Training instances to cluster

nrows

(int, optional, default is 1000) - Number of rows to log in the cluster labels.

kwargs

KMeans parameters

Returns

dict with all metadata, that can be assigned to the run namespace. run["summary"] = create_kmeans_summary(...)

Examples

# Create run
import neptune.new as neptune
run = neptune.init(project="WORKSPACE/PROJECT")
# Log random forest classifier summary
km = KMeans(n_init=11, max_iter=270)
X, y = make_blobs(n_samples=579, n_features=17, centers=7, random_state=28743)
import neptune.new.integrations.sklearn as npt_utils
run["kmeans/summary"] = npt_utils.create_kmeans_summary(km, X)

.get_estimator_params()

Get estimator parameters.

Parameters

estimator

(estimator) - Scikit-learn estimator from which to log parameters

Returns

dict with all parameters mapped to their values.

Examples

# Create run
import neptune.new as neptune
run = neptune.init(project="WORKSPACE/PROJECT")
# Log estimator parameters
rfr = RandomForestRegressor()
import neptune.new.integrations.sklearn as npt_utils
run["estimator/params"] = npt_utils.get_estimator_params(rfr)

.get_pickled_model()

Get pickled estimator.

Parameters

estimator

(estimator) - Scikit-learn estimator to pickle.

Returns

File value object with a pickled model that you can log to the run.

Examples

# Create run
import neptune.new as neptune
run = neptune.init(project="WORKSPACE/PROJECT")
# Log pickled model
rfr = RandomForestRegressor()
import neptune.new.integrations.sklearn as npt_utils
run["estimator/pickled_model"] = npt_utils.get_pickled_model(rfr)

.get_test_preds()

Get test predictions as a table.

If you pass y_pred, then predictions are not computed from X_test data.

The estimator should be fitted before calling this function.

Parameters

estimator

(estimator) - Scikit-learn estimator to compute predictions.

X_test

(ndarray) - Testing data matrix

y_test

(ndarray) - The regression target for testing

y_pred

(ndarray, optional, default is None) - Estimator predictions on test data.

nrows

(int, optional, default is 1000) - Number of rows to log.

Returns

File value object with test predictions as a table that you can log to the run.

Examples

# Create run
import neptune.new as neptune
run = neptune.init(project="WORKSPACE/PROJECT")
# Log test predictions as a table
rfr = RandomForestRegressor()
import neptune.new.integrations.sklearn as npt_utils
run["estimator/test_preds"] = npt_utils.get_test_preds(rfr, X_test, y_test)

.get_test_preds_proba()

Get test predictions probabilities.

If you pass X_test, then predictions probabilities are computed from data.

If you pass y_pred_proba, then predictions probabilities are not computed from X_test data.

The estimator should be fitted before calling this function.

Parameters

classifier

(classifier) - Scikit-learn classifier to compute predictions probabilities.

X_test

(ndarray) - Testing data matrix

y_pred_proba

(ndarray, optional, default is None) - Classifier predictions probabilities on test data.

nrows

(int, optional, default is 1000) - Number of rows to log.

Returns

File value object with test prediction probabilities as a table that you can log to the run.

Examples

# Create run
import neptune.new as neptune
run = neptune.init(project="WORKSPACE/PROJECT")
# Log classifier test predictions probabilities
rfr = RandomForestRegressor()
import neptune.new.integrations.sklearn as npt_utils
run["estimator/test_preds_proba"] = npt_utils.get_test_preds_proba(rfr, X_test)

.get_scores()

Get estimator scores on X.

If you pass y_pred, then predictions are not computed from X and y data.

The estimator should be fitted before calling this function.

Estimator

Logged scores

Single output regressors

explained variance, max error, mean absolute error, r2

Multi-output regressors

r2

Classifiers

precision, recall, f beta score, support

Parameters

estimator

(estimator) - Scikit-learn estimator to compute scores.

X

(ndarray) - Data matrix.

y

(ndarray) - Target for testing.

y_pred

(ndarray, optional, default is None) - Estimator predictions on data.

Returns

dict with scores.

Examples

# Create run
import neptune.new as neptune
run = neptune.init(project="WORKSPACE/PROJECT")
# Log estimator scores
rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)
import neptune.new.integrations.sklearn as npt_utils
run["estimator/scores"] = npt_utils.get_scores(rfc, X, y)

.create_learning_curve_chart()

Create a learning curve chart.

Parameters

regressor

(regressor) - Fitted scikit-learn regressor object

X_train

(ndarray) - Training data matrix

y_train

(ndarray) - The regression target for training.

Returns

File value object with a learning curve chart that you can log to the run.

Examples

# Create run
import neptune.new as neptune
run = neptune.init(project="WORKSPACE/PROJECT")
# Log a learning curve chart
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)
import neptune.new.integrations.sklearn as npt_utils
run["visuals/learning_curve"] = npt_utils.create_learning_curve_chart(rfr, X_train, y_train)

.create_feature_importance_chart()

Create a feature importance chart.

Parameters

regressor

(regressor) - Fitted scikit-learn regressor object

X_train

(ndarray) - Training data matrix

y_train

(ndarray) - The regression target for training.

Returns

File value object with a feature importance chart that you can log to the run.

Examples

# Create run
import neptune.new as neptune
run = neptune.init(project="WORKSPACE/PROJECT")
# Log a feature importance chart
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)
import neptune.new.integrations.sklearn as npt_utils
run["visuals/feature_importance"] = npt_utils.create_feature_importance_chart(rfr, X_train, y_train)

.create_residuals_chart()

Create a residuals chart.

Parameters

regressor

(regressor) - Fitted scikit-learn regressor object.

X_train

(ndarray) - Training data matrix.

X_test

(ndarray) - Testing data matrix.

y_train

(ndarray) - The regression target for training.

y_test

(ndarray) - The regression target for testing.

Returns

File value object with a residuals chart that you can log to the run.

Examples

# Create run
import neptune.new as neptune
run = neptune.init(project="WORKSPACE/PROJECT")
# Log a residuals chart
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)
import neptune.new.integrations.sklearn as npt_utils
run["visuals/residuals"] = npt_utils.create_residuals_chart(rfr, X_train, X_test, y_train, y_test)

.create_prediction_error_chart()

Create a prediction error chart.

Parameters

regressor

(regressor) - Fitted scikit-learn regressor object.

X_train

(ndarray) - Training data matrix.

X_test

(ndarray) - Testing data matrix.

y_train

(ndarray) - The regression target for training.

y_test

(ndarray) - The regression target for testing.

Returns

File value object with a prediction error chart that you can log to the run.

Examples

# Create run
import neptune.new as neptune
run = neptune.init(project="WORKSPACE/PROJECT")
# Log a prediction error chart
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)
import neptune.new.integrations.sklearn as npt_utils
run["visuals/prediction_error"] = npt_utils.create_prediction_error_chart(rfr, X_train, X_test, y_train, y_test)

.create_cooks_distance_chart()

Create a cooks distance chart.

Parameters

regressor

(regressor) - Fitted scikit-learn regressor object.

X_train

(ndarray) - Training data matrix.

y_train

(ndarray) - The regression target for training.

Returns

File value object with a cooks distance chart that you can log to the run.

Examples

# Create run
import neptune.new as neptune
run = neptune.init(project="WORKSPACE/PROJECT")
# Log a prediction error chart
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)
import neptune.new.integrations.sklearn as npt_utils
run["visuals/cooks_distance"] = npt_utils.create_cooks_distance_chart(rfr, X_train, y_train)

.create_classification_report_chart()

Create a classification report chart.

Parameters

regressor

(classifier) - Fitted scikit-learn regressor object.

X_train

(ndarray) - Training data matrix.

X_test

(ndarray) - Testing data matrix.

y_train

(ndarray) - The classification target for training.

y_test

(ndarray) - The classification target for testing.

Returns

File value object with a classification report chart that you can log to the run.

Examples

# Create run
import neptune.new as neptune
run = neptune.init(project="WORKSPACE/PROJECT")
# Log a classification report chart
rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)
import neptune.new.integrations.sklearn as npt_utils
run['visuals/classification_report'] = \
npt_utils.create_classification_report_chart(rfc, X_train, X_test, y_train, y_test)