This might depend on the default value of the parameter drop_intermediate
(default to true) of roc_curve()
, which is meant for dropping suboptimal thresholds, doc here. You might prevent such behaviour by passing drop_intermediate=False
, instead.
Here’s an example:
import numpy as np
try:
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1, cache=True)
mnist["target"] = mnist["target"].astype(np.int8)
except ImportError:
from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import cross_val_predict
X, y = mnist["data"], mnist["target"]
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
shuffle_index = np.random.permutation(60000)
X_train, y_train = X_train[shuffle_index], y_train[shuffle_index]
y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)
sdg_clf = SGDClassifier(random_state=42, verbose=0)
sdg_clf.fit(X_train, y_train_5)
y_scores = cross_val_predict(sdg_clf, X_train, y_train_5, cv=3, method='decision_function')
# ROC Curves
from sklearn.metrics import roc_curve
fpr, tpr, thresholds = roc_curve(y_train_5, y_scores)
len(thresholds), len(fpr), len(tpr)
# (3472, 3472, 3472)
# for roc curves, differently than for precision/recall curves, the length of thresholds and the other outputs do depend on drop_intermediate option, meant for dropping suboptimal thresholds
fpr_, tpr_, thrs = roc_curve(y_train_5, y_scores, drop_intermediate=False)
len(fpr_), len(tpr_), len(thrs)
# (60001, 60001, 60001)
solved sklearn.metrics.roc_curve only shows 5 fprs, tprs, thresholds [closed]