我遇到了一个ValueError错误:”ValueError: Number of labels=16512 does not match number of samples=16339″

我正在尝试机器学习,我是新手,所以我不知道为什么会出现这个错误:ValueError: Number of labels=16512 does not match number of samples=16339 我已经搜索过这个问题,但没有找到解决方法。有人能帮我解决这个问题吗?我完全不知道为什么会这样,我觉得我已经做对了一切。我正在尝试用这个模型来预测房价。

from sklearn.tree import DecisionTreeClassifierimport numpy as npfrom sklearn.model_selection import train_test_splittrain = pd.read_csv('housing.csv')X = train.drop(columns=["median_house_value", "ocean_proximity"])y = train["median_house_value"]X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.2)model = DecisionTreeClassifier()X_train = X_train.dropna()y_train = y_train.dropna()model.fit(X_train, y_train)

这是我的错误信息:

ValueError                                Traceback (most recent call last)<ipython-input-43-4691a6b66d80> in <module>     17 y_train = y_train.dropna()     18 ---> 19 model.fit(X_train, y_train)c:\users\zhang\appdata\local\programs\python\python38\lib\site-packages\sklearn\tree\_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)    888         """    889 --> 890         super().fit(    891             X, y,    892             sample_weight=sample_weight,c:\users\zhang\appdata\local\programs\python\python38\lib\site-packages\sklearn\tree\_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)    270     271         if len(y) != n_samples:--> 272             raise ValueError("Number of labels=%d does not match "    273                              "number of samples=%d" % (len(y), n_samples))    274         if not 0 <= self.min_weight_fraction_leaf <= 0.5:ValueError: Number of labels=16512 does not match number of samples=16339```

回答:

你可以试试下面的方法吗?我用这个方法没有遇到问题:

import pandas as pdfrom sklearn.tree import DecisionTreeClassifierimport numpy as npfrom sklearn.model_selection import train_test_splitdata = pd.read_csv('housing.csv')prices = data['median_house_value']features = data.drop(['median_house_value', 'ocean_proximity'], axis = 1)

prices.shape(20640,)features.shape(20640, 8)

X_train, X_test, y_train, y_test = train_test_split(features, prices, test_size=0.2, random_state=42)X_train = X_train.dropna()y_train = y_train.dropna()

y_train.shape(16512,)X_train.shape(16512, 8)

model.fit(X_train, y_train)DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,                       max_features=None, max_leaf_nodes=None,                       min_impurity_decrease=0.0, min_impurity_split=None,                       min_samples_leaf=1, min_samples_split=2,                       min_weight_fraction_leaf=0.0, presort=False,                       random_state=None, splitter='best')

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注