假设我有以下场景:
from sklearn import model_selectionfrom sklearn.linear_model import LogisticRegressionkfold = model_selection.KFold(n_splits=5, random_state=7)acc_per_fold = model_selection.cross_val_score(LogisticRegression(), x_inputs, np.ravel(y_response), cv=kfold, scoring='accuracy')
除了这些,我还能从model_selection.cross_val_score()
中得到什么?有没有办法查看每个实际折叠内的具体情况?我能得到每折的精确率-召回率吗?预测值呢?如何使用一个折叠训练好的模型对未见数据进行预测?
回答:
你可以使用cross_validate
函数来查看每个折叠中的情况。
import numpy as npfrom sklearn.datasets import make_classificationfrom sklearn.model_selection import cross_validatefrom sklearn.linear_model import LogisticRegressionfrom sklearn.metrics import accuracy_score, confusion_matrix, recall_score, roc_auc_score, precision_scoreX, y = make_classification( n_classes=2, class_sep=1.5, weights=[0.9, 0.1], n_features=20, n_samples=1000, random_state=10)clf = LogisticRegression(class_weight="balanced")scoring = {'accuracy': 'accuracy', 'recall': 'recall', 'precision': 'precision', 'roc_auc': 'roc_auc'}cross_val_scores = cross_validate(clf, X, y, cv=3, scoring=scoring)
输出结果如下,
{'fit_time': array([ 0. , 0. , 0.01559997]), 'score_time': array([ 0.01559997, 0. , 0. ]), 'test_accuracy': array([ 0.9251497 , 0.95808383, 0.93674699]), 'test_precision': array([ 0.59183673, 0.70833333, 0.63636364]), 'test_recall': array([ 0.85294118, 1. , 0.84848485]), 'test_roc_auc': array([ 0.96401961, 0.99343137, 0.96787271]), 'train_accuracy': array([ 0.96096096, 0.93693694, 0.95209581]), 'train_precision': array([ 0.73033708, 0.62376238, 0.69148936]), 'train_recall': array([ 0.97014925, 0.94029851, 0.95588235]), 'train_roc_auc': array([ 0.99426906, 0.98509954, 0.99223039])}
那么第一折发生了什么?
FOLD, METRIC = (0, 'test_precision')cross_val_scores[METRIC][FOLD]
精确率得分
是否稳定?
np.std(cross_val_scores[METRIC])