Scikit-learn(Python) StratifiedKFold的不同度量结果（F1分数）

我想找到我的StratifiedKFold的最佳分割，并在最佳分割上构建我的模型。代码如下：

def best_classifier(clf,k,x,y):    skf = StratifiedKFold(n_splits=k,shuffle=True)    bestclf = None    bestf1 = 0    bestsplit = []    cnt = 1    totalf1 = 0    for train_index,test_index in skf.split(x,y):        x_train,x_test = x[train_index],x[test_index]        y_train,y_test = y[train_index],y[test_index]        clf.fit(x_train,y_train)        predicted_y = clf.predict(x_test)        f1 = f1_score(y_test,predicted_y)        totalf1 = totalf1+f1        print(y_test.shape)        print(cnt," iteration f1 score",f1)        if cnt==10:            avg = totalf1/10             print(avg)        if f1>bestf1:            bestf1 = f1            bestclf = clf            bestsplit = [train_index,test_index]        cnt = cnt+1       return [bestclf,bestf1,bestsplit]

这个函数返回一个数组，包含我的分类器（针对最佳分割进行拟合）、最佳F1分数以及最佳分割的索引

我调用它如下：

best_of_best = best_classifier(sgd,10,x_selected,y)

现在，既然我捕获了最佳分割和我的分类器，我再次在相同的分割上测试它，只是为了检查我是否得到了与函数内部相同的结果。但显然并非如此。代码如下：

bestclf=  best_of_best[0]test_index = best_of_best[2][1]x_cv = x_selected[test_index]y_cv = y[test_index]pred_cv = bestclf.predict(x_cv)f1_score(y_cv,pred_cv)

当调用best_classifier方法时的结果：

(679,)1  iteration f1 score 0.643298969072(679,)2  iteration f1 score 0.761750405186(678,)3  iteration f1 score 0.732773109244(678,)4  iteration f1 score 0.632911392405(678,)5  iteration f1 score 0.74179743224(678,)6  iteration f1 score 0.749140893471(677,)7  iteration f1 score 0.750830564784(677,)8  iteration f1 score 0.756756756757(677,)9  iteration f1 score 0.682170542636(677,)10  iteration f1 score 0.638132295720.708956236151

当我在StratifiedKFold的最佳分割之外进行预测时的结果

0.86181818181818182

正如我们所见，这个F1分数在10个折叠中没有观察到。为什么会这样？我做错了什么吗？我的方法逻辑有问题吗？

回答：

问题解决了。问题在于我没有对clf对象进行深拷贝到bestclf。每当第K次折叠运行时，我的bestclf引用会更改为当前的clf，因为我没有进行深拷贝。

bestclf = copy.deepcopy(clf)

学技术

Scikit-learn(Python) StratifiedKFold的不同度量结果（F1分数）

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复