我在尝试运行这个机器学习平台时遇到了以下错误:
ValueError: X.shape[1] = 574 should be equal to 11, the number of features at training time
我的代码:
from pylab import *from sklearn.svm import SVCfrom sklearn.ensemble import RandomForestClassifierimport numpy as npX = list ()Y = list ()validationX = list ()validationY = list ()file = open ('C:\\Users\\User\\Desktop\\csci4113\\project1\\whitewineTraining.txt','r')for eachline in file: strArray = eachline.split(";") row = list () for i in range(len(strArray) - 1): row.append(float(strArray[i])) X.append(row) if (int(strArray[-1]) > 6): Y.append(1) else: Y.append(0)file2 = open ('C:\\Users\\User\\Desktop\\csci4113\\project1\\whitewineValidation.txt', 'r')for eachline in file2: strArray = eachline.split(";") row2 = list () for i in range(len(strArray) - 1): row2.append(float(strArray[i])) validationX.append(row2) if (int(strArray[-1]) > 6): validationY.append(1) else: validationY.append(0)X = np.array(X)print (X)Y = np.array(Y)print (Y)validationX = np.array(validationX)validationY = np.array(validationY)clf = svm.SVC()clf.fit(X,Y)result = clf.predict(validationX)clf.score(result, validationY)
程序的目标是通过fit()命令构建一个模型,我们可以使用它与验证集validationY进行比较,以评估我们机器学习模型的有效性。以下是控制台输出的其余部分:请注意,X是一个令人困惑的11×574数组!
[[ 7. 0.27 0.36 ..., 3. 0.45 8.8 ] [ 6.3 0.3 0.34 ..., 3.3 0.49 9.5 ] [ 8.1 0.28 0.4 ..., 3.26 0.44 10.1 ] ..., [ 6.3 0.28 0.22 ..., 3. 0.33 10.6 ] [ 7.4 0.16 0.33 ..., 3.04 0.68 10.5 ] [ 8.4 0.27 0.3 ..., 2.89 0.3 11.46666667]][0 0 0 ..., 0 1 0]C:\Users\User\Anaconda3\lib\site-packages\sklearn\utils\validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample. DeprecationWarning)Traceback (most recent call last): File "<ipython-input-68-31c649fe24b3>", line 1, in <module> runfile('C:/Users/User/Desktop/csci4113/project1/program1.py', wdir='C:/Users/User/Desktop/csci4113/project1') File "C:\Users\User\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile execfile(filename, namespace) File "C:\Users\User\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 89, in execfile exec(compile(f.read(), filename, 'exec'), namespace) File "C:/Users/User/Desktop/csci4113/project1/program1.py", line 43, in <module> clf.score(result, validationY) File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\base.py", line 310, in score return accuracy_score(y, self.predict(X), sample_weight=sample_weight) File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 568, in predict y = super(BaseSVC, self).predict(X) File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 305, in predict X = self._validate_for_predict(X) File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 474, in _validate_for_predict (n_features, self.shape_fit_[1]))ValueError: X.shape[1] = 574 should be equal to 11, the number of features at training timerunfile('C:/Users/User/Desktop/csci4113/project1/program1.py', wdir='C:/Users/User/Desktop/csci4113/project1')10[[ 7. 0.27 0.36 ..., 3. 0.45 8.8 ] [ 6.3 0.3 0.34 ..., 3.3 0.49 9.5 ] [ 8.1 0.28 0.4 ..., 3.26 0.44 10.1 ] ..., [ 6.3 0.28 0.22 ..., 3. 0.33 10.6 ] [ 7.4 0.16 0.33 ..., 3.04 0.68 10.5 ] [ 8.4 0.27 0.3 ..., 2.89 0.3 11.46666667]][0 0 0 ..., 0 1 0]C:\Users\User\Anaconda3\lib\site-packages\sklearn\utils\validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample. DeprecationWarning)Traceback (most recent call last): File "<ipython-input-69-31c649fe24b3>", line 1, in <module> runfile('C:/Users/User/Desktop/csci4113/project1/program1.py', wdir='C:/Users/User/Desktop/csci4113/project1') File "C:\Users\User\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile execfile(filename, namespace) File "C:\Users\User\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 89, in execfile exec(compile(f.read(), filename, 'exec'), namespace) File "C:/Users/User/Desktop/csci4113/project1/program1.py", line 46, in <module> clf.score(result, validationY) File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\base.py", line 310, in score return accuracy_score(y, self.predict(X), sample_weight=sample_weight) File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 568, in predict y = super(BaseSVC, self).predict(X) File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 305, in predict X = self._validate_for_predict(X) File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 474, in _validate_for_predict (n_features, self.shape_fit_[1]))``
回答:
您只是传递了错误的对象给评分函数,文档明确指出
score(X, y, sample_weight=None)
X : array-like, shape = (n_samples, n_features)测试样本。
而您传递的是预测结果,因此
result = clf.predict(validationX)clf.score(result, validationY)
是无效的,应该改为
clf.score(validationX, validationY)
如果您使用的是某种评分器而不是分类器,那么您尝试的方法是可以的,分类器的.score方法会自己调用.predict,因此您应该传递原始数据作为参数。