使用交叉验证评分获得零分

我在尝试对数据集使用cross_val_score，但总是得到零分：

这是我的代码：

df = pd.read_csv("Flaveria.csv")df = pd.get_dummies(df, columns=["N level", "species"], drop_first=True)# Extracting the target value from the datasetX = df.iloc[:, df.columns != "Plant Weight(g)"]y = np.array(df.iloc[:, 0], dtype="S6")logreg = LogisticRegression()loo = LeaveOneOut()scores = cross_val_score(logreg, X, y, cv=loo)print(scores)

特征是分类值，而目标值是浮点数。我不确定为什么我只得到零分。

在创建虚拟变量之前，数据看起来像这样

N level,species,Plant Weight(g)L,brownii,0.3008L,brownii,0.3288M,brownii,0.3304M,brownii,0.388M,brownii,0.406H,brownii,0.3955H,brownii,0.3797H,brownii,0.2962

更新后的代码，我仍然得到零分：

 from sklearn.model_selection import LeaveOneOutfrom sklearn.model_selection import cross_val_scorefrom sklearn.ensemble import RandomForestRegressorimport numpy as npimport pandas as pd# Creating dummies for the non numerical features in the datasetdf = pd.read_csv("Flaveria.csv")df = pd.get_dummies(df, columns=["N level", "species"], drop_first=True)# Extracting the target value from the datasetX = df.iloc[:, df.columns != "Plant Weight(g)"]y = df.iloc[:, 0]forest = RandomForestRegressor()loo = LeaveOneOut()scores = cross_val_score(forest, X, y, cv=loo)print(scores)

回答：

一般的cross_val_score会使用给定的迭代器将数据分成训练和测试集，然后用训练数据拟合模型，并在测试折叠上进行评分。对于回归，r2_score是scikit中的默认值。

你指定了LeaveOneOut()作为你的cv迭代器。所以每个折叠将包含一个测试案例。在这种情况下，R_squared将始终为0。

查看维基百科中R2的公式：

R2 = 1 - (SS_res/SS_tot)

以及

SS_tot = sqr(sum(y - y_mean))

在这里，对于单个案例，y_mean将等于y值，因此分母为0。所以整个R2是未定义的（Nan）。在这种情况下，scikit-learn会将值设置为0，而不是nan。

将LeaveOneOut()更改为其他CV迭代器，如KFold，会像你已经观察到的那样，给你一些非零的结果。

学技术

使用交叉验证评分获得零分

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复