我有一个关于逻辑回归的问题,我遇到了ValueError
这是我的数据集:
sub1 sub2 sub3 sub4pol_1 0.000000 0.000000 0.0 0.000000 pol_2 0.000000 0.000000 0.0 0.000000 pol_3 0.050000 0.000000 0.0 0.000000 pol_4 0.000000 0.000000 0.0 0.000000 pol_5 0.000000 0.000000 0.0 0.000000 pol_6 0.000000 0.000000 0.0 0.000000 pol_7 0.000000 0.000000 0.0 0.000000 pol_8 0.000000 0.000000 0.0 0.000000 pol_9 0.000000 0.000000 0.0 0.000000 pol_10 0.000000 0.000000 0.0 0.032423 pol_11 0.000000 0.000000 0.0 0.000000 pol_12 0.000000 0.000000 0.0 0.000000 pol_13 0.000000 0.000000 0.0 0.000000 pol_14 0.000000 0.053543 0.0 0.000000 pol_15 0.000000 0.000000 0.0 0.000000 pol_16 0.000000 0.000000 0.0 0.000000 pol_17 0.000000 0.000000 0.0 0.000000 pol_18 0.000000 0.000000 0.0 0.053453 pol_19 0.000000 0.058344 0.0 0.000000 pol_20 0.054677 0.000000 0.0 0.000000
这是我的代码:
array = df.valuesX = array[:,0:3]Y = array[:,3]validation_size = 0.20seed = 7X_train, X_validation, Y_train, Y_validation = model_selection.train_test_split(X, Y, test_size=validation_size, random_state=seed)seed = 7scoring = 'accuracy'kfold = model_selection.KFold(n_splits=10, random_state=seed)cv_results = model_selection.cross_val_score(LogisticRegression(), X_train, Y_train, cv=kfold, scoring=scoring)print(cv_results)
这会导致以下错误:
ValueError: Unknown label type: 'continuous'
如何解决这个问题?
此外,我查看了某些链接,发现问题可能与数据类型有关,在我的情况下是:
print(df.dtypes)print(X_train.dtype)pol_1 float64pol_2 float64pol_3 float64pol_4 float64pol_5 float64pol_6 float64pol_7 float64pol_8 float64pol_9 float64pol_10 float64pol_11 float64pol_12 float64pol_13 float64pol_14 float64pol_15 float64pol_16 float64pol_17 float64pol_18 float64pol_19 float64pol_20 float64Length: 20, dtype: objectfloat64
我尝试将X_train
和Y_train
的数据类型转换为string
,但得到了相同的错误。
谢谢!
回答:
Y
的类型应该是int
。也就是说,它应该由表示类别标签的整数组成。然而,在你的数据框中,Y
列由浮点数组成,因此会出现这个错误。