API reference: scikit-learn integration#
You can use the Neptune integration with scikit-learn to track your classifiers, regressors, and k-means clustering results.
create_regressor_summary()
#
Returns a scikit-learn regressor summary that includes:
- All regressor parameters
- Pickled estimator (model)
- Test predictions
- Test scores
- Model performance visualizations
The regressor should be fitted before calling this function.
Parameters
Name | Type | Default | Description |
---|---|---|---|
regressor |
regressor |
- | Fitted scikit-learn regressor object. |
X_train |
ndarray |
- | Training data matrix. |
X_test |
ndarray |
- | Testing data matrix. |
y_train |
ndarray |
- | The regression target for training. |
y_test |
ndarray |
- | The regression target for testing. |
nrows |
int , optional |
1000 |
Log first nrows rows of test predictions. |
log_charts |
bool , optional |
True |
If True , calculate and log chart visualizations.This is equivalent to calling the Note: Calculating visualizations is potentially expensive depending on input data and regressor, and may take some time to finish. |
Returns
dict
with all metadata, which can be assigned to a run namespace:
Example
# Create a run
import neptune
run = neptune.init_run()
# Log random forest regressor summary
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)
import neptune.integrations.sklearn as npt_utils
run["random_forest/summary"] = npt_utils.create_regressor_summary(
rfr, X_train, X_test, y_train, y_test
)
create_classifier_summary()
#
Returns a scikit-learn classifier summary that includes:
- All classifier parameters
- Pickled estimator (model)
- Test predictions
- Test predictions probabilities
- Test scores
- Model performance visualizations
The classifier should be fitted before calling this function.
Parameters
Name | Type | Default | Description |
---|---|---|---|
classifier |
classifier |
- | Fitted scikit-learn classifier object. |
X_train |
ndarray |
- | Training data matrix. |
X_test |
ndarray |
- | Testing data matrix. |
y_train |
ndarray |
- | The classification target for training. |
y_test |
ndarray |
- | The classification target for testing. |
nrows |
int , optional |
1000 |
Log first nrows rows of test predictions and prediction probabilities. |
log_charts |
bool , optional |
True |
If True , calculate and log chart visualizations.This is equivalent to calling the Note: Calculating visualizations is potentially expensive depending on input data and regressor, and may take some time to finish. |
Returns
dict
with all metadata, which can be assigned to the run namespace:
Example
# Create a run
import neptune
run = neptune.init_run()
# Log random forest classifier summary
rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)
import neptune.integrations.sklearn as npt_utils
run["random_forest/summary"] = npt_utils.create_classifier_summary(
rfc, X_train, X_test, y_train, y_test
)
create_kmeans_summary()
#
Returns a scikit-learn k-means summary.
This method fits the k-means model to data and logs:
- All KMeans parameters
- Cluster labels
- Clustering visualizations: k-means elbow chart and silhouette coefficients chart
Parameters
Name | Type | Default | Description |
---|---|---|---|
model |
KMeans |
- | KMeans object |
X |
ndarray |
- | Training instances to cluster |
nrows |
int , optional |
1000 |
Number of rows to log in the cluster labels |
kwargs |
- | - | KMeans parameters |
Returns
dict
with all metadata, which can be assigned to a run namespace: run["summary"] = create_kmeans_summary(...)
Example
# Create a run
import neptune
run = neptune.init_run()
# Log random forest classifier summary
km = KMeans(n_init=11, max_iter=270)
X, y = make_blobs(n_samples=579, n_features=17, centers=7, random_state=28743)
import neptune.integrations.sklearn as npt_utils
run["kmeans/summary"] = npt_utils.create_kmeans_summary(km, X)
get_estimator_params()
#
Get estimator parameters.
Parameters
Name | Type | Description |
---|---|---|
estimator |
estimator |
Scikit-learn estimator to log parameters for. |
Returns
dict
with all parameters mapped to their values.
Example
# Create a run
import neptune
run = neptune.init_run()
# Log estimator parameters
rfr = RandomForestRegressor()
import neptune.integrations.sklearn as npt_utils
from neptune.utils import stringify_unsupported
run["estimator/params"] = stringify_unsupported(npt_utils.get_estimator_params(rfr))
get_pickled_model()
#
Get pickled estimator.
Parameters
Name | Type | Description |
---|---|---|
estimator |
estimator |
Scikit-learn estimator to pickle. |
Returns
File
value object with a pickled model that you can log to the run.
Example
# Create a run
import neptune
run = neptune.init_run()
# Log pickled model
rfr = RandomForestRegressor()
import neptune.integrations.sklearn as npt_utils
run["estimator/pickled_model"] = npt_utils.get_pickled_model(rfr)
get_test_preds()
#
Get test predictions as a table.
If you pass y_pred
, predictions are not computed from X_test
data.
The estimator should be fitted before calling this function.
Parameters
Name | Type | Default | Description |
---|---|---|---|
estimator |
estimator |
- | scikit-learn estimator to compute predictions. |
X_test |
ndarray |
- | Testing data matrix. |
y_test |
ndarray |
- | The regression target for testing. |
y_pred |
ndarray , optional |
None |
Estimator predictions on test data. |
nrows |
int , optional |
1000 |
Number of rows to log. |
Returns
File
value object with test predictions as a table that you can log to the run.
Example
# Create a run
import neptune
run = neptune.init_run()
# Log test predictions as a table
rfr = RandomForestRegressor()
import neptune.integrations.sklearn as npt_utils
run["estimator/test_preds"] = npt_utils.get_test_preds(rfr, X_test, y_test)
get_test_preds_proba()
#
Get test prediction probabilities.
- If you pass
X_test
, prediction probabilities are computed from data. - If you pass
y_pred_proba
, prediction probabilities are not computed fromX_test
data.
The estimator should be fitted before calling this function.
Parameters
Name | Type | Default | Description |
---|---|---|---|
classifier |
classifier |
- | scikit-learn classifier to compute prediction probabilities. |
X_test |
ndarray |
- | Testing data matrix. |
y_pred_proba |
ndarray , optional |
None |
Classifier prediction probabilities on test data. |
nrows |
int , optional |
1000 |
Number of rows to log. |
Returns
File
value object with test prediction probabilities as a table that you can log to the run.
Example
# Create a run
import neptune
run = neptune.init_run()
# Log classifier test predictions probabilities
rfr = RandomForestRegressor()
import neptune.integrations.sklearn as npt_utils
run["estimator/test_preds_proba"] = npt_utils.get_test_preds_proba(rfr, X_test)
get_scores()
#
Get estimator scores on X
.
- If you pass
y_pred
, predictions are not computed fromX
andy
data.
The estimator should be fitted before calling this function.
Estimator | Logged scores |
---|---|
Single output regressors | Explained variance, max error, mean absolute error, \(r^2\) |
Multi output regressors | \(r^2\) |
Classifiers | Precision, recall, f beta score, support |
Parameters
Name | Type | Default | Description |
---|---|---|---|
estimator |
estimator |
- | scikit-learn estimator to compute scores. |
X |
ndarray |
- | Data matrix. |
y |
ndarray |
- | Target for testing. |
y_pred |
ndarray , optional |
None |
Estimator predictions on data. |
Returns
dict
with scores.
Example
# Create a run
import neptune
run = neptune.init_run()
# Log estimator scores
rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)
import neptune.integrations.sklearn as npt_utils
run["estimator/scores"] = npt_utils.get_scores(rfc, X, y)
create_learning_curve_chart()
#
Returns a learning curve chart.
Parameters
Name | Type | Default | Description |
---|---|---|---|
regressor |
regressor |
- | Fitted scikit-learn regressor object |
X_train |
ndarray |
- | Training data matrix |
y_train |
ndarray |
- | The regression target for training |
Returns
File
value object with a learning curve chart that you can log to the run.
Example
# Create a run
import neptune
run = neptune.init_run()
# Log a learning curve chart
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)
import neptune.integrations.sklearn as npt_utils
run["visuals/learning_curve"] = npt_utils.create_learning_curve_chart(
rfr, X_train, y_train
)
create_feature_importance_chart()
#
Returns a feature importance chart.
Parameters
Name | Type | Default | Description |
---|---|---|---|
regressor |
regressor |
- | Fitted scikit-learn regressor object |
X_train |
ndarray |
- | Training data matrix |
y_train |
ndarray |
- | The regression target for training |
Returns
File
value object with a feature importance chart that you can log to the run.
Example
# Create a run
import neptune
run = neptune.init_run()
# Log a feature importance chart
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)
import neptune.integrations.sklearn as npt_utils
run["visuals/feature_importance"] = npt_utils.create_feature_importance_chart(
rfr, X_train, y_train
)
create_residuals_chart()
#
Returns a residuals chart.
Parameters
Name | Type | Default | Description |
---|---|---|---|
regressor |
regressor |
- | Fitted scikit-learn regressor object |
X_train |
ndarray |
- | Training data matrix |
X_test |
ndarray |
- | Testing data matrix |
y_train |
ndarray |
- | The regression target for training |
y_test |
ndarray |
- | The regression target for testing |
Returns
File
value object with a residuals chart that you can log to the run.
Example
# Create a run
import neptune
run = neptune.init_run()
# Log a residuals chart
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)
import neptune.integrations.sklearn as npt_utils
run["visuals/residuals"] = npt_utils.create_residuals_chart(
rfr, X_train, X_test, y_train, y_test
)
create_prediction_error_chart()
#
Returns a prediction error chart.
Parameters
Name | Type | Default | Description |
---|---|---|---|
regressor |
regressor |
- | Fitted scikit-learn regressor object |
X_train |
ndarray |
- | Training data matrix |
X_test |
ndarray |
- | Testing data matrix |
y_train |
ndarray |
- | The regression target for training |
y_test |
ndarray |
- | The regression target for testing |
Returns
File
value object with a prediction error chart that you can log to the run.
Example
# Create a run
import neptune
run = neptune.init_run()
# Log a prediction error chart
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)
import neptune.integrations.sklearn as npt_utils
run["visuals/prediction_error"] = npt_utils.create_prediction_error_chart(
rfr, X_train, X_test, y_train, y_test
)
create_cooks_distance_chart()
#
Returns a Cook's distance chart.
Parameters
Name | Type | Default | Description |
---|---|---|---|
regressor |
regressor |
- | Fitted scikit-learn regressor object |
X_train |
ndarray |
- | Training data matrix |
y_train |
ndarray |
- | The regression target for training |
Returns
File
value object with a Cook's distance chart that you can log to the run.
Example
# Create a run
import neptune
run = neptune.init_run()
# Log a prediction error chart
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)
import neptune.integrations.sklearn as npt_utils
run["visuals/cooks_distance"] = npt_utils.create_cooks_distance_chart(
rfr, X_train, y_train
)
create_classification_report_chart()
#
Returns a classification report chart.
Parameters
Name | Type | Default | Description |
---|---|---|---|
classifier |
classifier |
- | Fitted scikit-learn classifier object |
X_train |
ndarray |
- | Training data matrix |
X_test |
ndarray |
- | Testing data matrix |
y_train |
ndarray |
- | The classification target for training |
y_test |
ndarray |
- | The classification target for testing |
Returns
File
value object with a classification report chart that you can log to the run.
Example
# Create a run
import neptune
run = neptune.init_run()
# Log a classification report chart
rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)
import neptune.integrations.sklearn as npt_utils
run["visuals/cls_report"] = npt_utils.create_classification_report_chart(
rfc, X_train, X_test, y_train, y_test
)
create_confusion_matrix_chart()
#
Returns a confusion matrix.
Parameters
Name | Type | Default | Description |
---|---|---|---|
classifier |
classifier |
- | Fitted scikit-learn classifier object. |
X_train |
ndarray |
- | Training data matrix. |
X_test |
ndarray |
- | Testing data matrix. |
y_train |
ndarray |
- | The classification target for training. |
y_test |
ndarray |
- | The classification target for testing. |
Returns
File
value object that you can log to the run.
Example
Create a run:
Log the chart:
rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)
import neptune.integrations.sklearn as npt_utils
run["visuals/confusion_matrix"] = npt_utils.create_confusion_matrix_chart(
rfc, X_train, X_test, y_train, y_test
)
create_roc_auc_chart()
#
Returns a ROC-AUC chart.
Parameters
Name | Type | Default | Description |
---|---|---|---|
classifier |
classifier |
- | Fitted scikit-learn classifier object. |
X_train |
ndarray |
- | Training data matrix. |
X_test |
ndarray |
- | Testing data matrix. |
y_train |
ndarray |
- | The classification target for training. |
y_test |
ndarray |
- | The classification target for testing. |
Returns
File
value object that you can log to the run.
Example
Create a run:
Log the chart:
rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)
import neptune.integrations.sklearn as npt_utils
run["visuals/roc_auc"] = npt_utils.create_roc_auc_chart(
rfc, X_train, X_test, y_train, y_test
)
create_precision_recall_chart()
#
Returns a precision-recall chart.
Parameters
Name | Type | Default | Description |
---|---|---|---|
classifier |
classifier |
- | Fitted scikit-learn classifier object. |
X_test |
ndarray |
- | Testing data matrix. |
y_test |
ndarray |
- | The classification target for testing. |
y_pred_proba |
ndarray |
- | Classifier predictions probabilities on test data. |
Returns
File
value object that you can log to the run.
Example
Create a run:
Log the chart:
rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)
import neptune.integrations.sklearn as npt_utils
run["visuals/precision_recall"] = npt_utils.create_precision_recall_chart(
rfc, X_test, y_test
)
create_class_prediction_error_chart()
#
Returns a class prediction error chart.
Parameters
Name | Type | Default | Description |
---|---|---|---|
classifier |
classifier |
- | Fitted scikit-learn classifier object. |
X_train |
ndarray |
- | Training data matrix. |
X_test |
ndarray |
- | Testing data matrix. |
y_train |
ndarray |
- | The classification target for training. |
y_test |
ndarray |
- | The classification target for testing. |
Returns
File
value object that you can log to the run.
Example
Create a run:
Log the chart:
rfc = RandomForestClassifier()
rfc.fit(X_train, y_train)
import neptune.integrations.sklearn as npt_utils
run["visuals/class_pred_error"] = npt_utils.create_class_prediction_error_chart(
rfc, X_train, X_test, y_train, y_test
)
get_cluster_labels()
#
Logs the index of the cluster label each sample belongs to.
Parameters
Name | Type | Default | Description |
---|---|---|---|
model |
KMeans |
- | KMeans object. |
X |
ndarray |
- | Training instances to cluster. |
nrows |
int , optional |
1000 |
Number of rows to log. |
kwargs |
- | - | KMeans parameters. |
Returns
File
value object that you can log to the run.
Example
Create a run:
Log the labels:
km = KMeans(n_init=11, max_iter=270)
X, y = make_blobs(n_samples=579, n_features=17, centers=7, random_state=28743)
import neptune.integrations.sklearn as npt_utils
run["kmeans/cluster_labels"] = npt_utils.get_cluster_labels(km, X)
create_kelbow_chart()
#
Returns the K-elbow chart for the KMeans clusterer.
Parameters
Name | Type | Default | Description |
---|---|---|---|
model |
KMeans |
- | KMeans object. |
X |
ndarray |
- | Training instances to cluster. |
kwargs |
- | - | KMeans parameters. |
Returns
File
value object that you can log to the run.
Example
Create a run:
Log the chart:
km = KMeans(n_init=11, max_iter=270)
X, y = make_blobs(n_samples=579, n_features=17, centers=7, random_state=28743)
import neptune.integrations.sklearn as npt_utils
run["kmeans/kelbow"] = npt_utils.create_kelbow_chart(km, X)
create_silhouette_chart()
#
Returns the silhouette coefficient charts for the KMeans clusterer.
Charts are computed for j = 2, 3, ..., n_clusters.
Parameters
Name | Type | Default | Description |
---|---|---|---|
model |
KMeans |
- | KMeans object. |
X |
ndarray |
- | Training instances to cluster. |
kwargs |
- | - | KMeans parameters. |
Returns
File
value object that you can log to the run.
Example
Create a run:
Log the charts:
km = KMeans(n_init=11, max_iter=270)
X, y = make_blobs(n_samples=579, n_features=17, centers=7, random_state=28743)
import neptune.integrations.sklearn as npt_utils
run["kmeans/silhouette"] = npt_utils.create_silhouette_chart(km, X, n_clusters=12)
See also
neptune-sklearn on GitHub