我已经计算了 X_train, X_test, y_train, y_test
。但我无法计算 y_train_true, y_train_prob, y_test_true, y_test_prob
。
我如何从以下代码中计算 y_train_true, y_train_prob, y_test_true, y_test_prob
?
X_train:
X_test:
y_train:
y_test:
注意,
y_train_true:训练数据集中真实的二元标签,0或1
y_train_prob:模型对训练数据集预测的概率,范围在{0,1}
y_test_true:测试数据集中真实的二元标签,0或1
y_test_prob:模型对测试数据集预测的概率,范围在{0,1}
代码如下:
# 分割测试和训练数据
import numpy as np
from sklearn.model_selection import train_test_split
X = np.array(dataset.ix[:, 1:10])
y = np.array(dataset['benign_malignant'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# 定义分类器和 ====
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
# knn = KNeighborsClassifier(n_neighbors=11)
knn.fit(X_train, y_train)
# 预测测试集结果
y_pred = knn.predict(X_train)
回答:
在你的情况下,y_train
和 y_test
已经是 y_train_true
和 y_test_true
。要获得 y_train_prob
和 y_test_prob
,你需要使用一个模型。我不知道你使用的是哪个数据集,但这似乎是一个二元分类问题,因此你可以使用逻辑回归来实现,如下所示:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
knn.fit(X_train, y_train)
y_train_prob = knn.predict_proba(X_train)
y_test_prob = knn.predict_proba(X_test)