如何使用sklearn的KNeighborsClassifier训练和预测数据集的单个特征值?

我读取了一个csv数据集,并使用pandas数据框存储数据,然后将数据分为训练集和测试集。我尝试使用每次一个特征来训练和预测准确性,以便之后可以找出4个特征中哪个是最好的预测器。我是Python和机器学习的新手,所以请耐心指导我。这实际上是我第一次尝试这两种技术。在这一行my_knn_for_cs4661.fit(X_train[col], y_train)我遇到了一个错误,大约是关于array.reshape(-1,1)的问题。我尝试过X_train[col].reshape(-1,1),但得到了一些其他的错误。我使用的是Python 3,在Jupyter Notebook上运行,使用了sklearn、numpy和pandas。

以下是我的代码和错误

from sklearn.model_selection import train_test_splitiris_df = pd.read_csv('https://raw.githubusercontent.com/mpourhoma/CS4661/master/iris.csv')feature_cols = ['sepal_length','sepal_width','petal_length','petal_width']X = iris_df[feature_cols] y = iris_df['species']predictions= {}X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=6)k = 3my_knn_for_cs4661 = KNeighborsClassifier(n_neighbors=k)for col in feature_cols:    my_knn_for_cs4661.fit(X_train[col], y_train)    y_predict = my_knn_for_cs4661.predict(X_test)    predictions[col] = y_predict

我的错误:

---------------------------------------------------------------------------ValueError                                Traceback (most recent call last)<ipython-input-41-933eb8b496d8> in <module>()     13 for col in feature_cols:     14 ---> 15     my_knn_for_cs4661.fit(X_train[col], y_train)     16     y_predict = my_knn_for_cs4661.predict(X_test)     17     predictions[col] = y_predict~\Anaconda3\lib\site-packages\sklearn\neighbors\base.py in fit(self, X, y)    763         """    764         if not isinstance(X, (KDTree, BallTree)):--> 765             X, y = check_X_y(X, y, "csr", multi_output=True)    766     767         if y.ndim == 1 or y.ndim == 2 and y.shape[1] == 1:~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)    571     X = check_array(X, accept_sparse, dtype, order, copy, force_all_finite,    572                     ensure_2d, allow_nd, ensure_min_samples,--> 573                     ensure_min_features, warn_on_dtype, estimator)    574     if multi_output:    575         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)    439                     "Reshape your data either using array.reshape(-1, 1) if "    440                     "your data has a single feature or array.reshape(1, -1) "--> 441                     "if it contains a single sample.".format(array))    442             array = np.atleast_2d(array)    443             # To ensure that array flags are maintainedValueError: Expected 2D array, got 1D array instead:array=[6.  5.  5.7 6.3 5.6 5.6 4.6 5.8 5.8 4.7 5.5 5.4 5.8 6.4 6.5 6.7 6.1 6.9 7.2 6.2 5.1 4.9 6.5 6.8 5.1 4.6 5.7 7.9 6.1 6.3 6.8 5.5 6.3 6.7 5.5 5. 7.3 4.4 5.3 4.8 4.5 4.6 5.  5.8 6.9 4.8 7.7 5.8 5.4 6.7 5.5 6.7 5.9 5.6 5.  6.  5.9 7.  5.4 4.9 5.  5.2 6.  5.1 6.1 6.2 5.6 6.7 6.8 5.8 6.7 5.7 7.2 5.4 7.4 4.4 6.2 6.5 5.  6.7 6.6 4.9 5.  6.  5.5 6.2 5.7 7.2 4.9 6. ].Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

回答:

我找到了一个解决方案,虽然看起来有点不太规范,不知道这是不是Pythonic的方式。

iris_df = pd.read_csv('https://raw.githubusercontent.com/mpourhoma/CS4661/master/iris.csv')feature_cols = ['sepal_length','sepal_width','petal_length','petal_width']X = iris_df[feature_cols] y = iris_df['species']predictions= {}X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=6)k = 3my_knn_for_cs4661 = KNeighborsClassifier(n_neighbors=k)for col in feature_cols:    my_knn_for_cs4661.fit(X_train[col].values.reshape(-1,1), y_train)    y_predict = my_knn_for_cs4661.predict(X_test[col].values.reshape(-1,1))    predictions[col] = accuracy_score(y_test, y_predict)print(predictions)

Related Posts

在使用k近邻算法时,有没有办法获取被使用的“邻居”?

我想找到一种方法来确定在我的knn算法中实际使用了哪些…

Theano在Google Colab上无法启用GPU支持

我在尝试使用Theano库训练一个模型。由于我的电脑内…

准确性评分似乎有误

这里是代码: from sklearn.metrics…

Keras Functional API: “错误检查输入时:期望input_1具有4个维度,但得到形状为(X, Y)的数组”

我在尝试使用Keras的fit_generator来训…

如何使用sklearn.datasets.make_classification在指定范围内生成合成数据?

我想为分类问题创建合成数据。我使用了sklearn.d…

如何处理预测时不在训练集中的标签

已关闭。 此问题与编程或软件开发无关。目前不接受回答。…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注