编写机器学习分类算法

我在尝试为机器学习模型编写一个分类算法，但出现了错误。有人能帮忙吗？提前感谢

import pandas as pdfrom sklearn.metrics import accuracy_scorefrom scipy.spatial import distancedef euc(a, b):        return distance.euclidean(a,b)class classifierKN():    def fit(self, X_train, Y_train):        self.X_train = X_train        self.Y_train = Y_train            def predict(self, X_test):        predictions = []        for row in X_test:            label = self.closest(row)            predictions.append(label)        return predictions    def closest(self, row):        best_dist = euc(row, self.X_train[0])        best_index = 0        for i in range(1, len(self.X_train)):            dist = euc(row, self.X_train[i])            if dist < best_dist:                best_dist = dist                best_index = i        return self.Y_train[best_index]#Load the dataset diabetdata = pd.read_csv("diabetes.csv")#set features and targetfeatures = ["PlasmaGlucose", "DiastolicBloodPressure", "TricepsThickness", "SerumInsulin"]X = diabetdata[features]print("FEATURES: " , X.head())Y = diabetdata.Diabeticprint("TARGET: " , Y.head())print("")from sklearn.model_selection import train_test_split  #No module named 'sklearn.cross_validation' so I replace it with model_selectionX_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size=0.3, random_state=0)#predict model= classifierKN()model.fit(X_train,Y_train)predictKN = model.predict(X)print ("Predict result with KNeighborsClassifier")print(predictKN)#accuracyprint("Accuracy")print (accuracy_score(Y, predictKN))

结果

在处理上述异常时，发生了另一个异常：Traceback (most recent call last):  File "C:\Users\Vlad\Desktop\Machine learning\Machine Learning\coursework\test2.py", line 63, in <module>    predictKN = model.predict(X)  File "C:\Users\Vlad\Desktop\Machine learning\Machine Learning\coursework\test2.py", line 26, in predict    label = self.closest(row)  File "C:\Users\Vlad\Desktop\Machine learning\Machine Learning\coursework\test2.py", line 30, in closest    best_dist = euc(row, self.X_train[0])  File "E:\Anaconda\lib\site-packages\pandas\core\frame.py", line 2800, in __getitem__    indexer = self.columns.get_loc(key)  File "E:\Anaconda\lib\site-packages\pandas\core\indexes\base.py", line 2648, in get_loc    return self._engine.get_loc(self._maybe_cast_indexer(key))  File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc  File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc  File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item  File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_itemKeyError: 0

回答：

你的代码实际上存在多个问题，同时理解起来有点困难。你的问题主要似乎与你对pandas数据框/系列的理解有关，因为你显然试图用以下方式迭代数据框的行：

def predict(self, X_test):        predictions = []        for row in X_test:            label = self.closest(row)            predictions.append(label)        return predictions

这在pandas中行不通。要实际迭代行的值，你需要像这样做：

def predict(self, X_test):        predictions = []        for row in X_test.iterrows():            label = self.closest(list(row[1]))            predictions.append(label)        return predictions

这个函数实际上会迭代数据框中的行，并将行的值传递给closest()函数。

def closest(self, row):        best_dist = euc(row, self.X_train[0])        best_index = 0        for i in range(1, len(self.X_train)):            dist = euc(row, self.X_train[i])            if dist < best_dist:                best_dist = dist                best_index = i        return self.Y_train[best_index]

然而，这个函数不起作用，因为你基本上是在尝试用best_dist = euc(row, self.X_train[0])获取row[0]的值。这会抛出一个KeyError，因为X_train是一个数据框，没有0列（无论如何你也不想索引该列）。你想要的是输入行与数据框中第一行的距离作为默认的best_dist。这可以通过best_dist = euc(row, self.X_train.iloc[0])来实现。然后你需要迭代X_train中的行（这里你的函数有同样的问题），所以你需要将其更改为类似于：

def closest(self, row):    best_dist = euc(row, self.X_train.iloc[0])    best_index = 0    for i in range(1, len(self.X_train.index)):        dist = euc(row, list(self.X_train.iloc[i]))        if dist < best_dist:            best_dist = dist            best_index = i    return self.Y_train.iloc[best_index]

这至少是可行的。是否能给你想要的输出和/或足够的准确性，我无法保证，但它确实解决了你的直接问题。

学技术

编写机器学习分类算法

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复