这是我所做的。代码如下。我有music.csv数据集。错误是发现输入变量的样本数量不一致:[4, 1]。错误详情在代码之后。
# 导入数据import pandas as pdmusic_data = pd.read_csv('music.csv')music_data# 分成训练和测试集-没有需要清理的数据# genre = 预测# 输入是年龄和性别,输出是genre# 方法=dropX = music_data.drop(columns=['genre']) # 包含除genre之外的所有数据# X= 输入Y = music_data['genre'] # 只包含genre# Y=输出# 现在选择算法from sklearn.tree import DecisionTreeClassifiermodel = DecisionTreeClassifier() # 模型model.fit(X, Y)prediction = model.predict([[21, 1]])from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2) # 20%的数据用于测试# 前两个是输入,后一个是输出model.fit(X_train, y_train)from sklearn.metrics import accuracy_scorescore = accuracy_score(y_test, predictions)
然后出现了这个错误。这是一个值错误
ValueError Traceback (most recent call last)~\AppData\Local\Temp/ipykernel_28312/3992581865.py in <module> 5 model.fit(X_train, y_train) 6 from sklearn.metrics import accuracy_score----> 7 score = accuracy_score(y_test, predictions)c:\users\shrey\appdata\local\programs\python\python39\lib\site- packages\sklearn\utils\validation.py in inner_f(*args, **kwargs) 61 extra_args = len(args) - len(all_args) 62 if extra_args <= 0:---> 63 return f(*args, **kwargs) 64 65 # extra_args > 0c:\users\shrey\appdata\local\programs\python\python39\lib\site- packages\sklearn\metrics\_classification.py in accuracy_score(y_true, y_pred, normalize, sample_weight)200 201 # Compute accuracy for each possible representation--> 202 y_type, y_true, y_pred = _check_targets(y_true, y_pred)203 check_consistent_length(y_true, y_pred, sample_weight)204 if y_type.startswith('multilabel'):c:\users\shrey\appdata\local\programs\python\python39\lib\site- packages\sklearn\metrics\_classification.py in _check_targets(y_true, y_pred) 81 y_pred : array or indicator matrix 82 """ ---> 83 check_consistent_length(y_true, y_pred) 84 type_true = type_of_target(y_true) 85 type_pred = type_of_target(y_pred) c:\users\shrey\appdata\local\programs\python\python39\lib\site- packages\sklearn\utils\validation.py in check_consistent_length(*arrays)317 uniques = np.unique(lengths)318 if len(uniques) > 1:--> 319 raise ValueError("Found input variables with inconsistent numbers of"320 " samples: %r" % [int(l) for l in lengths])321 ValueError: Found input variables with inconsistent numbers of samples: [4, 1]
请帮帮我。我不知道发生了什么,但我认为这与score = accuracy_score(y_test, predictions)有关。
回答:
在分割后的测试数据中,你有四个条目(行),这意味着y_test的长度为4。
而在尝试对[21, 1]进行预测时,你实际上只是在对一行进行预测。因此,prediction的长度为1。
这就是为什么你会得到样本数量不一致的错误。
你可以通过以下方式解决这个问题:
-
对X_test进行预测
prediction = model.predict(X_test)
-
如果要对新数据进行预测,你需要分离目标(y_test)和输入特征(X_test),然后进行预测。例如,如果[21,1]的目标是[2]
prediction = model.predict([[21,1]])y_test = [2] ## 注意这取决于相应的目标标签是什么score = accuracy_score(y_test,prediction)