Scikit-learn API Reference
You can use Neptune integration with Scikit-Learn to track your classifiers, regressors, and k-means clustering results.
You can find detailed information on how to install and use the integration in the user guide.

.create_regressor_summary()

Create a scikit-learn regressor summary.
This method creates a regressor summary that includes:
  • all regressor parameters,
  • pickled estimator (model),
  • test predictions,
  • test scores,
  • model performance visualizations.
Regressor should be fitted before calling this function.
Parameters
regressor
(regressor) - Fitted scikit-learn regressor object
X_train
(ndarray) - Training data matrix
X_test
(ndarray) - Testing data matrix
y_train
(ndarray) - The regression target for training
y_test
(ndarray) -The regression target for testing
nrows
(int, optional, default is 1000) - Log first nrows rows of test predictions.
log_charts
(bool, optional, default is True) - If True, calculate and log chart visualizations.
This is equivalent to calling log_learning_curve_chart, log_feature_importance_chart, log_residuals_chart, log_prediction_error_chart, log_cooks_distance_chart functions from this module.
Note: Calculating visualizations is potentially expensive depending on input data and regressor, and | may take some time to finish.

Returns

dict with all metadata, that can be assigned to the run namespace. run["summary"] = create_regressor_summary(...)

Examples

1
# Create run
2
import neptune.new as neptune
3
run = neptune.init(project="WORKSPACE/PROJECT")
4
5
# Log random forest regressor summary
6
rfr = RandomForestRegressor()
7
rfr.fit(X_train, y_train)
8
9
import neptune.new.integrations.sklearn as npt_utils
10
run["random_forest/summary"] = npt_utils.create_regressor_summary(rfr, X_train, X_test, y_train, y_test)
Copied!

.create_classifier_summary()

Create a scikit-learn classifier summary.
This method creates a classifier summary that includes:
  • all classifier parameters,
  • pickled estimator (model),
  • test predictions,
  • test predictions probabilities,
  • test scores,
  • model performance visualizations.
The classifier should be fitted before calling this function.
Parameters
classifier
(classifier) - Fitted scikit-learn classifier object
X_train
(ndarray) - Training data matrix
X_test
(ndarray) - Testing data matrix
y_train
(ndarray) - The classification target for training
y_test
(ndarray) -The classification target for testing
nrows
(int, optional, default is 1000) - Log first nrows rows of test predictions and predictions probabilities.
log_charts
(bool, optional, default is True) - If True, calculate and log chart visualizations.
This is equivalent to calling log_classification_report_chart, log_confusion_matrix_chart, log_roc_auc_chart, log_precision_recall_chart, log_class_prediction_error_chart functions from this module.
Note: Calculating visualizations is potentially expensive depending on input data and regressor, and | may take some time to finish.

Returns

dict with all metadata, that can be assigned to the run namespace. run["summary"] = create_classifier_summary(...)

Examples

1
# Create run
2
import neptune.new as neptune
3
run = neptune.init(project="WORKSPACE/PROJECT")
4
5
# Log random forest classifier summary
6
rfc = RandomForestClassifier()
7
rfc.fit(X_train, y_train)
8
9
import neptune.new.integrations.sklearn as npt_utils
10
run["random_forest/summary"] = npt_utils.create_classifier_summary(rfc, X_train, X_test, y_train, y_test)
Copied!

.create_kmeans_summary()

Create scikit-learn K-Means summary.
This method fits KMeans model to data and logs:
  • all kmeans parameters,
  • cluster labels,
  • clustering visualizations: KMeans elbow chart and silhouette coefficients chart.
Parameters
model
(KMeans) - KMeans object
X
(ndarray) - Training instances to cluster
nrows
(int, optional, default is 1000) - Number of rows to log in the cluster labels.
kwargs
KMeans parameters

Returns

dict with all metadata, that can be assigned to the run namespace. run["summary"] = create_kmeans_summary(...)

Examples

1
# Create run
2
import neptune.new as neptune
3
run = neptune.init(project="WORKSPACE/PROJECT")
4
5
# Log random forest classifier summary
6
km = KMeans(n_init=11, max_iter=270)
7
X, y = make_blobs(n_samples=579, n_features=17, centers=7, random_state=28743)
8
9
import neptune.new.integrations.sklearn as npt_utils
10
run["kmeans/summary"] = npt_utils.create_kmeans_summary(km, X)
Copied!

.get_estimator_params()

Get estimator parameters.
Parameters
estimator
(estimator) - Scikit-learn estimator from which to log parameters

Returns

dict with all parameters mapped to their values.

Examples

1
# Create run
2
import neptune.new as neptune
3
run = neptune.init(project="WORKSPACE/PROJECT")
4
5
# Log estimator parameters
6
rfr = RandomForestRegressor()
7
8
import neptune.new.integrations.sklearn as npt_utils
9
run["estimator/params"] = npt_utils.get_estimator_params(rfr)
Copied!

.get_pickled_model()

Get pickled estimator.
Parameters
estimator
(estimator) - Scikit-learn estimator to pickle.

Returns

File value object with a pickled model that you can log to the run.

Examples

1
# Create run
2
import neptune.new as neptune
3
run = neptune.init(project="WORKSPACE/PROJECT")
4
5
# Log pickled model
6
rfr = RandomForestRegressor()
7
8
import neptune.new.integrations.sklearn as npt_utils
9
run["estimator/pickled_model"] = npt_utils.get_pickled_model(rfr)
Copied!

.get_test_preds()

Get test predictions as a table.
If you pass y_pred, then predictions are not computed from X_test data.
The estimator should be fitted before calling this function.
Parameters
estimator
(estimator) - Scikit-learn estimator to compute predictions.
X_test
(ndarray) - Testing data matrix
y_test
(ndarray) - The regression target for testing
y_pred
(ndarray, optional, default is None) - Estimator predictions on test data.
nrows
(int, optional, default is 1000) - Number of rows to log.

Returns

File value object with test predictions as a table that you can log to the run.

Examples

1
# Create run
2
import neptune.new as neptune
3
run = neptune.init(project="WORKSPACE/PROJECT")
4
5
# Log test predictions as a table
6
rfr = RandomForestRegressor()
7
8
import neptune.new.integrations.sklearn as npt_utils
9
run["estimator/test_preds"] = npt_utils.get_test_preds(rfr, X_test, y_test)
Copied!

.get_test_preds_proba()

Get test predictions probabilities.
If you pass X_test, then predictions probabilities are computed from data.
If you pass y_pred_proba, then predictions probabilities are not computed from X_test data.
The estimator should be fitted before calling this function.
Parameters
classifier
(classifier) - Scikit-learn classifier to compute predictions probabilities.
X_test
(ndarray) - Testing data matrix
y_pred_proba
(ndarray, optional, default is None) - Classifier predictions probabilities on test data.
nrows
(int, optional, default is 1000) - Number of rows to log.

Returns

File value object with test prediction probabilities as a table that you can log to the run.

Examples

1
# Create run
2
import neptune.new as neptune
3
run = neptune.init(project="WORKSPACE/PROJECT")
4
5
# Log classifier test predictions probabilities
6
rfr = RandomForestRegressor()
7
8
import neptune.new.integrations.sklearn as npt_utils
9
run["estimator/test_preds_proba"] = npt_utils.get_test_preds_proba(rfr, X_test)
Copied!

.get_scores()

Get estimator scores on X.
If you pass y_pred, then predictions are not computed from X and y data.
The estimator should be fitted before calling this function.
Estimator
Logged scores
Single output regressors
explained variance, max error, mean absolute error, r2
Multi-output regressors
r2
Classifiers
precision, recall, f beta score, support
Parameters
estimator
(estimator) - Scikit-learn estimator to compute scores.
X
(ndarray) - Data matrix.
y
(ndarray) - Target for testing.
y_pred
(ndarray, optional, default is None) - Estimator predictions on data.

Returns

dict with scores.

Examples

1
# Create run
2
import neptune.new as neptune
3
run = neptune.init(project="WORKSPACE/PROJECT")
4
5
# Log estimator scores
6
rfc = RandomForestClassifier()
7
rfc.fit(X_train, y_train)
8
9
import neptune.new.integrations.sklearn as npt_utils
10
run["estimator/scores"] = npt_utils.get_scores(rfc, X, y)
Copied!

.create_learning_curve_chart()

Create a learning curve chart.
Parameters
regressor
(regressor) - Fitted scikit-learn regressor object
X_train
(ndarray) - Training data matrix
y_train
(ndarray) - The regression target for training.

Returns

File value object with a learning curve chart that you can log to the run.

Examples

1
# Create run
2
import neptune.new as neptune
3
run = neptune.init(project="WORKSPACE/PROJECT")
4
5
# Log a learning curve chart
6
rfr = RandomForestRegressor()
7
rfr.fit(X_train, y_train)
8
9
import neptune.new.integrations.sklearn as npt_utils
10
run["visuals/learning_curve"] = npt_utils.create_learning_curve_chart(rfr, X_train, y_train)
Copied!

.create_feature_importance_chart()

Create a feature importance chart.
Parameters
regressor
(regressor) - Fitted scikit-learn regressor object
X_train
(ndarray) - Training data matrix
y_train
(ndarray) - The regression target for training.

Returns

File value object with a feature importance chart that you can log to the run.

Examples

1
# Create run
2
import neptune.new as neptune
3
run = neptune.init(project="WORKSPACE/PROJECT")
4
5
# Log a feature importance chart
6
rfr = RandomForestRegressor()
7
rfr.fit(X_train, y_train)
8
9
import neptune.new.integrations.sklearn as npt_utils
10
run["visuals/feature_importance"] = npt_utils.create_feature_importance_chart(rfr, X_train, y_train)
Copied!

.create_residuals_chart()

Create a residuals chart.
Parameters
regressor
(regressor) - Fitted scikit-learn regressor object.
X_train
(ndarray) - Training data matrix.
X_test
(ndarray) - Testing data matrix.
y_train
(ndarray) - The regression target for training.
y_test
(ndarray) - The regression target for testing.

Returns

File value object with a residuals chart that you can log to the run.

Examples

1
# Create run
2
import neptune.new as neptune
3
run = neptune.init(project="WORKSPACE/PROJECT")
4
5
# Log a residuals chart
6
rfr = RandomForestRegressor()
7
rfr.fit(X_train, y_train)
8
9
import neptune.new.integrations.sklearn as npt_utils
10
run["visuals/residuals"] = npt_utils.create_residuals_chart(rfr, X_train, X_test, y_train, y_test)
Copied!

.create_prediction_error_chart()

Create a prediction error chart.
Parameters
regressor
(regressor) - Fitted scikit-learn regressor object.
X_train
(ndarray) - Training data matrix.
X_test
(ndarray) - Testing data matrix.
y_train
(ndarray) - The regression target for training.
y_test
(ndarray) - The regression target for testing.

Returns

File value object with a prediction error chart that you can log to the run.

Examples

1
# Create run
2
import neptune.new as neptune
3
run = neptune.init(project="WORKSPACE/PROJECT")
4
5
# Log a prediction error chart
6
rfr = RandomForestRegressor()
7
rfr.fit(X_train, y_train)
8
9
import neptune.new.integrations.sklearn as npt_utils
10
run["visuals/prediction_error"] = npt_utils.create_prediction_error_chart(rfr, X_train, X_test, y_train, y_test)
Copied!

.create_cooks_distance_chart()

Create a cooks distance chart.
Parameters
regressor
(regressor) - Fitted scikit-learn regressor object.
X_train
(ndarray) - Training data matrix.
y_train
(ndarray) - The regression target for training.

Returns

File value object with a cooks distance chart that you can log to the run.

Examples

1
# Create run
2
import neptune.new as neptune
3
run = neptune.init(project="WORKSPACE/PROJECT")
4
5
# Log a prediction error chart
6
rfr = RandomForestRegressor()
7
rfr.fit(X_train, y_train)
8
9
import neptune.new.integrations.sklearn as npt_utils
10
run["visuals/cooks_distance"] = npt_utils.create_cooks_distance_chart(rfr, X_train, y_train)
Copied!

.create_classification_report_chart()

Create a classification report chart.
Parameters
regressor
(classifier) - Fitted scikit-learn regressor object.
X_train
(ndarray) - Training data matrix.
X_test
(ndarray) - Testing data matrix.
y_train
(ndarray) - The classification target for training.
y_test
(ndarray) - The classification target for testing.

Returns

File value object with a classification report chart that you can log to the run.

Examples

1
# Create run
2
import neptune.new as neptune
3
run = neptune.init(project="WORKSPACE/PROJECT")
4
5
# Log a classification report chart
6
rfc = RandomForestClassifier()
7
rfc.fit(X_train, y_train)
8
9
import neptune.new.integrations.sklearn as npt_utils
10
run['visuals/classification_report'] = \
11
npt_utils.create_classification_report_chart(rfc, X_train, X_test, y_train, y_test)
Copied!