无法使SVC评分函数正常工作

我在尝试运行这个机器学习平台时遇到了以下错误:

ValueError: X.shape[1] = 574 should be equal to 11, the number of features at training time

我的代码:

from pylab import *from sklearn.svm import SVCfrom sklearn.ensemble import RandomForestClassifierimport numpy as npX = list ()Y = list ()validationX = list ()validationY = list ()file = open ('C:\\Users\\User\\Desktop\\csci4113\\project1\\whitewineTraining.txt','r')for eachline in file:    strArray = eachline.split(";")    row = list ()    for i in range(len(strArray) - 1):        row.append(float(strArray[i]))     X.append(row)    if (int(strArray[-1]) > 6):        Y.append(1)    else:        Y.append(0)file2 = open ('C:\\Users\\User\\Desktop\\csci4113\\project1\\whitewineValidation.txt', 'r')for eachline in file2:    strArray = eachline.split(";")    row2 = list ()    for i in range(len(strArray) - 1):        row2.append(float(strArray[i]))     validationX.append(row2)          if (int(strArray[-1]) > 6):        validationY.append(1)    else:        validationY.append(0)X = np.array(X)print (X)Y = np.array(Y)print (Y)validationX = np.array(validationX)validationY = np.array(validationY)clf = svm.SVC()clf.fit(X,Y)result = clf.predict(validationX)clf.score(result, validationY)

程序的目标是通过fit()命令构建一个模型,我们可以使用它与验证集validationY进行比较,以评估我们机器学习模型的有效性。以下是控制台输出的其余部分:请注意,X是一个令人困惑的11×574数组!

[[  7.           0.27         0.36       ...,   3.           0.45         8.8       ] [  6.3          0.3          0.34       ...,   3.3          0.49         9.5       ] [  8.1          0.28         0.4        ...,   3.26         0.44        10.1       ] ...,  [  6.3          0.28         0.22       ...,   3.           0.33        10.6       ] [  7.4          0.16         0.33       ...,   3.04         0.68        10.5       ] [  8.4          0.27         0.3        ...,   2.89         0.3   11.46666667]][0 0 0 ..., 0 1 0]C:\Users\User\Anaconda3\lib\site-packages\sklearn\utils\validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.  DeprecationWarning)Traceback (most recent call last):  File "<ipython-input-68-31c649fe24b3>", line 1, in <module>    runfile('C:/Users/User/Desktop/csci4113/project1/program1.py', wdir='C:/Users/User/Desktop/csci4113/project1')  File "C:\Users\User\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile    execfile(filename, namespace)  File "C:\Users\User\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 89, in execfile    exec(compile(f.read(), filename, 'exec'), namespace)  File "C:/Users/User/Desktop/csci4113/project1/program1.py", line 43, in <module>    clf.score(result, validationY)  File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\base.py", line 310, in score    return accuracy_score(y, self.predict(X), sample_weight=sample_weight)  File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 568, in predict    y = super(BaseSVC, self).predict(X)  File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 305, in predict    X = self._validate_for_predict(X)  File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 474, in _validate_for_predict    (n_features, self.shape_fit_[1]))ValueError: X.shape[1] = 574 should be equal to 11, the number of features at training timerunfile('C:/Users/User/Desktop/csci4113/project1/program1.py', wdir='C:/Users/User/Desktop/csci4113/project1')10[[  7.           0.27         0.36       ...,   3.           0.45         8.8       ] [  6.3          0.3          0.34       ...,   3.3          0.49         9.5       ] [  8.1          0.28         0.4        ...,   3.26         0.44        10.1       ] ...,  [  6.3          0.28         0.22       ...,   3.           0.33        10.6       ] [  7.4          0.16         0.33       ...,   3.04         0.68        10.5       ] [  8.4          0.27         0.3        ...,   2.89         0.3   11.46666667]][0 0 0 ..., 0 1 0]C:\Users\User\Anaconda3\lib\site-packages\sklearn\utils\validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.  DeprecationWarning)Traceback (most recent call last):  File "<ipython-input-69-31c649fe24b3>", line 1, in <module>    runfile('C:/Users/User/Desktop/csci4113/project1/program1.py', wdir='C:/Users/User/Desktop/csci4113/project1')  File "C:\Users\User\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile    execfile(filename, namespace)  File "C:\Users\User\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 89, in execfile    exec(compile(f.read(), filename, 'exec'), namespace)  File "C:/Users/User/Desktop/csci4113/project1/program1.py", line 46, in <module>    clf.score(result, validationY)  File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\base.py", line 310, in score    return accuracy_score(y, self.predict(X), sample_weight=sample_weight)  File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 568, in predict    y = super(BaseSVC, self).predict(X)  File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 305, in predict    X = self._validate_for_predict(X)  File "C:\Users\User\Anaconda3\lib\site-packages\sklearn\svm\base.py", line 474, in _validate_for_predict    (n_features, self.shape_fit_[1]))``

回答:

您只是传递了错误的对象给评分函数,文档明确指出

score(X, y, sample_weight=None)

X : array-like, shape = (n_samples, n_features)测试样本。

而您传递的是预测结果,因此

result = clf.predict(validationX)clf.score(result, validationY)

是无效的,应该改为

clf.score(validationX, validationY)

如果您使用的是某种评分器而不是分类器,那么您尝试的方法是可以的,分类器的.score方法会自己调用.predict,因此您应该传递原始数据作为参数。

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注