[Solved] sklearn.metrics.roc_curve only shows 5 fprs, tprs, thresholds [closed]

Question

This might depend on the default value of the parameter drop_intermediate (default to true) of roc_curve(), which is meant for dropping suboptimal thresholds, doc here. You might prevent such behaviour by passing drop_intermediate=False, instead.

Here’s an example:

import numpy as np
try:
    from sklearn.datasets import fetch_openml
    mnist = fetch_openml('mnist_784', version=1, cache=True)   
    mnist["target"] = mnist["target"].astype(np.int8)
except ImportError:
    from sklearn.datasets import fetch_mldata 
    mnist = fetch_mldata('MNIST original')

from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import cross_val_predict

X, y = mnist["data"], mnist["target"]
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
shuffle_index = np.random.permutation(60000)
X_train, y_train = X_train[shuffle_index], y_train[shuffle_index]

y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)

sdg_clf = SGDClassifier(random_state=42, verbose=0)
sdg_clf.fit(X_train, y_train_5)

y_scores = cross_val_predict(sdg_clf, X_train, y_train_5, cv=3, method='decision_function')

# ROC Curves

from sklearn.metrics import roc_curve

fpr, tpr, thresholds = roc_curve(y_train_5, y_scores)

len(thresholds), len(fpr), len(tpr)
# (3472, 3472, 3472)

# for roc curves, differently than for precision/recall curves, the length of thresholds and the other outputs do depend on drop_intermediate option, meant for dropping suboptimal thresholds

fpr_, tpr_, thrs = roc_curve(y_train_5, y_scores, drop_intermediate=False)
len(fpr_), len(tpr_), len(thrs)
# (60001, 60001, 60001)

Accepted Answer

This might depend on the default value of the parameter drop_intermediate (default to true) of roc_curve(), which is meant for dropping suboptimal thresholds, doc here. You might prevent such behaviour by passing drop_intermediate=False, instead.

Here’s an example:

import numpy as np
try:
    from sklearn.datasets import fetch_openml
    mnist = fetch_openml('mnist_784', version=1, cache=True)   
    mnist["target"] = mnist["target"].astype(np.int8)
except ImportError:
    from sklearn.datasets import fetch_mldata 
    mnist = fetch_mldata('MNIST original')

from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import cross_val_predict

X, y = mnist["data"], mnist["target"]
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
shuffle_index = np.random.permutation(60000)
X_train, y_train = X_train[shuffle_index], y_train[shuffle_index]

y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)

sdg_clf = SGDClassifier(random_state=42, verbose=0)
sdg_clf.fit(X_train, y_train_5)

y_scores = cross_val_predict(sdg_clf, X_train, y_train_5, cv=3, method='decision_function')

# ROC Curves

from sklearn.metrics import roc_curve

fpr, tpr, thresholds = roc_curve(y_train_5, y_scores)

len(thresholds), len(fpr), len(tpr)
# (3472, 3472, 3472)

# for roc curves, differently than for precision/recall curves, the length of thresholds and the other outputs do depend on drop_intermediate option, meant for dropping suboptimal thresholds

fpr_, tpr_, thrs = roc_curve(y_train_5, y_scores, drop_intermediate=False)
len(fpr_), len(tpr_), len(thrs)
# (60001, 60001, 60001)