我在应用K折交叉验证时遇到了问题。请有人帮助我解决这个问题。当我使用train_test_split时没有问题,但K折交叉验证在索引方面出现了麻烦。
如何在我的数据集中应用K折交叉验证?
我的代码如下
from sklearn.model_selection import KFolddf = pd.read_csv('CD.TXT',delimiter=',')df.head() X = df[['A', 'B', 'C', 'D']].valuesY=df['Label'].valuesX=pd.DataFrame(X)Y=pd.DataFrame(Y)cv = KFold(n_splits=10, random_state=42, shuffle=False)for train_index, test_index in cv.split(X): print("Train Index: ", train_index, "\n") print("Test Index: ", test_index)X_train, X_test, Y_train, Y_test = X[train_index], X[test_index], Y[train_index], Y[test_index]print(X_train)print(Y_train)
我的数据集如下
A,B,C,D,Label10,20,30,40,120,20,15,60,010,20,30,40,110,20,30,40,110,20,39,40,110,20,30,40,110,20,30,40,110,20,32,40,110,20,30,40,110,20,30,40,110,20,3,40,120,20,15,60,020,20,15,60,020,20,12,60,020,20,15,60,020,20,15,60,020,20,12,60,020,20,15,60,0
我遇到的错误如下
Test Index: [18]Traceback (most recent call last): File "<ipython-input-11-10016b897261>", line 1, in <module> runfile('D:/experiments/untitled0.py', wdir='D:/experiments') File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile execfile(filename, namespace) File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile exec(compile(f.read(), filename, 'exec'), namespace) File "D:/experiments/untitled0.py", line 61, in <module> X_train, X_test, Y_train, Y_test = X[train_index], X[test_index], Y[train_index], Y[test_index] File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2934, in __getitem__ raise_missing=True) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1354, in _convert_to_indexer return self._get_listlike_indexer(obj, axis, **kwargs)[1] File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1161, in _get_listlike_indexer raise_missing=raise_missing) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1252, in _validate_read_indexer raise KeyError("{} not in index".format(not_found))KeyError: '[4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17] not in index'
回答:
错误的原因是您尝试使用numpy索引来索引数据框。
尝试注释掉X=pd.DataFrame(X) Y=pd.DataFrame(Y)
from sklearn.model_selection import KFolddf = pd.read_csv('CD.TXT',delimiter=',')df.head() X = df[['A', 'B', 'C', 'D']].valuesY=df['Label'].values#X=pd.DataFrame(X)#Y=pd.DataFrame(Y)cv = KFold(n_splits=10, random_state=42, shuffle=False)for train_index, test_index in cv.split(X): print("Train Index: ", train_index, "\n") print("Test Index: ", test_index)X_train, X_test, Y_train, Y_test = X[train_index], X[test_index], Y[train_index], Y[test_index]print(X_train)print(Y_train)
或者尝试使用
from sklearn.model_selection import KFolddf = pd.read_csv('CD.TXT',delimiter=',')df.head() X = df[['A', 'B', 'C', 'D']].valuesY=df['Label'].valuesX=pd.DataFrame(X)Y=pd.DataFrame(Y)cv = KFold(n_splits=10, random_state=42, shuffle=False)for train_index, test_index in cv.split(X): print("Train Index: ", train_index, "\n") print("Test Index: ", test_index)X_train, X_test, Y_train, Y_test = X.iloc[train_index,:], X.iloc[test_index,:], Y.iloc[train_index], Y.iloc[test_index]print(X_train)print(Y_train)