Python中的训练-测试分割似乎无法正常工作?

我在Python中尝试运行kNN(k最近邻)算法。

我用于尝试此算法的数据集可以在UCI机器学习库中找到:https://archive.ics.uci.edu/ml/datasets/wine

这是我使用的代码:

#1. LIBRARIESimport osimport pandas as pdimport numpy as npprint os.getcwd() # Prints the working directoryos.chdir('C:\\file_path') # Provide the path here#2. VARIABLESvariables = pd.read_csv('wines.csv')winery = variables['winery']alcohol = variables['alcohol']malic = variables['malic']ash = variables['ash']ash_alcalinity = variables['ash_alcalinity']magnesium = variables['magnesium']phenols = variables['phenols']flavanoids = variables['flavanoids']nonflavanoids = variables['nonflavanoids']proanthocyanins = variables['proanthocyanins']color_intensity = variables['color_intensity']hue = variables['hue']od280 = variables['od280']proline = variables['proline']#3. MAX-MIN NORMALIZATIONalcoholscaled=(alcohol-min(alcohol))/(max(alcohol)-min(alcohol))malicscaled=(malic-min(malic))/(max(malic)-min(malic))ashscaled=(ash-min(ash))/(max(ash)-min(ash))ash_alcalinity_scaled=(ash_alcalinity-min(ash_alcalinity))/(max(ash_alcalinity)-min(ash_alcalinity))magnesiumscaled=(magnesium-min(magnesium))/(max(magnesium)-min(magnesium))phenolsscaled=(phenols-min(phenols))/(max(phenols)-min(phenols))flavanoidsscaled=(flavanoids-min(flavanoids))/(max(flavanoids)-min(flavanoids))nonflavanoidsscaled=(nonflavanoids-min(nonflavanoids))/(max(nonflavanoids)-min(nonflavanoids))proanthocyaninsscaled=(proanthocyanins-min(proanthocyanins))/(max(proanthocyanins)-min(proanthocyanins))color_intensity_scaled=(color_intensity-min(color_intensity))/(max(color_intensity)-min(color_intensity))huescaled=(hue-min(hue))/(max(hue)-min(hue))od280scaled=(od280-min(od280))/(max(od280)-min(od280))prolinescaled=(proline-min(proline))/(max(proline)-min(proline))alcoholscaled.mean()alcoholscaled.median()alcoholscaled.min()alcoholscaled.max()#4. DATA FRAMEd = {'alcoholscaled' : pd.Series([alcoholscaled]),'malicscaled' : pd.Series([malicscaled]),'ashscaled' : pd.Series([ashscaled]),'ash_alcalinity_scaled' : pd.Series([ash_alcalinity_scaled]),'magnesiumscaled' : pd.Series([magnesiumscaled]),'phenolsscaled' : pd.Series([phenolsscaled]),'flavanoidsscaled' : pd.Series([flavanoidsscaled]),'nonflavanoidsscaled' : pd.Series([nonflavanoidsscaled]),'proanthocyaninsscaled' : pd.Series([proanthocyaninsscaled]),'color_intensity_scaled' : pd.Series([color_intensity_scaled]),'hue_scaled' : pd.Series([huescaled]),'od280scaled' : pd.Series([od280scaled]),'prolinescaled' : pd.Series([prolinescaled])}df = pd.DataFrame(d)#5. TRAIN-TEST SPLITfrom sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(np.matrix(df),np.matrix(winery),test_size=0.3)print X_train.shape, y_train.shapeprint X_test.shape, y_test.shape#6. K-NEAREST NEIGHBOUR ALGORITHMfrom sklearn.neighbors import KNeighborsClassifierknn = KNeighborsClassifier(n_neighbors=10)knn.fit(X_train, y_train)print("Test set score: {:.2f}".format(knn.score(X_test, y_test)))

在第5部分,当我运行sklearn.model_selection导入训练-测试分割机制时,似乎没有正确运行,因为它提供了以下形状:(0,13) (0,178) (1,13) (1,178)

然后,在尝试运行knn时,我得到了错误消息:Found array with 0 sample(s) (shape=(0,13)) while a minimum of 1 is required. 这并不是由于最大-最小归一化造成的,因为即使变量未进行归一化,我仍然会收到这个错误消息。


回答:

我不确定你的代码哪里出了问题,与sklearn文档相比,你的方法略有不同。然而,我可以向你展示一种不同的方法来对葡萄酒数据集进行训练-测试分割。

from sklearn.datasets import load_winefrom sklearn.preprocessing import MinMaxScalerfrom sklearn.model_selection import train_test_splitfrom sklearn.neighbors import KNeighborsClassifierX, y = load_wine(return_X_y=True)X_scaled = MinMaxScaler().fit_transform(X)X_train, X_test, y_train, y_test = train_test_split(X_scaled, y,                                                    test_size=0.3)knn = KNeighborsClassifier(n_neighbors=10)knn.fit(X_train, y_train)

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注