我正在尝试使用 Python 通过梯度下降反向传播来重建一个简单的多层感知器人工神经网络。我的目标是尝试重现 MATLAB 的 ANN 产生的准确率,但我的结果甚至连接近都没有。我使用了与 MATLAB 相同的参数;相同数量的隐藏节点(20),1000个周期,学习率(alpha)为0.01,以及相同的数据(显然),但我的代码在改善结果方面毫无进展,而 MATLAB 的准确率达到了98%左右。
我尝试通过 MATLAB 进行调试以了解它在做什么,但收效甚微。我认为 MATLAB 将输入数据缩放到0到1之间,并在输入中添加偏置,我在 Python 代码中也使用了这些方法。
MATLAB 做了什么导致结果如此之高?或者,更可能的是,我在 Python 代码中做了什么导致结果如此糟糕?我能想到的只有权重的初始值不佳,数据读取错误,或数据处理不当,或激活函数不正确/不佳(我也尝试过使用 tanh,结果相同)。
下面是我的尝试,基于我在网上找到的代码并稍作修改以读取我的数据,而 MATLAB 脚本(仅11行代码)则在其下方。最下面是数据集的链接(我也是通过 MATLAB 获得的):
感谢任何帮助。
Main.py
import numpy as npimport Processimport matplotlib.pyplot as pltfrom sklearn.metrics import confusion_matrix, classification_reportfrom sklearn.cross_validation import train_test_splitfrom sklearn.preprocessing import LabelBinarizerimport warningsdef sigmoid(x): return 1.0/(1.0 + np.exp(-x))def sigmoid_prime(x): return sigmoid(x)*(1.0-sigmoid(x))class NeuralNetwork: def __init__(self, layers): self.activation = sigmoid self.activation_prime = sigmoid_prime # Set weights self.weights = [] # layers = [2,2,1] # range of weight values (-1,1) # input and hidden layers - random((2+1, 2+1)) : 3 x 3 for i in range(1, len(layers) - 1): r = 2*np.random.random((layers[i-1] + 1, layers[i] + 1)) - 1 self.weights.append(r) # output layer - random((2+1, 1)) : 3 x 1 r = 2*np.random.random((layers[i] + 1, layers[i+1])) - 1 self.weights.append(r) def fit(self, X, y, learning_rate, epochs): # Add column of ones to X # This is to add the bias unit to the input layer ones = np.atleast_2d(np.ones(X.shape[0])) X = np.concatenate((ones.T, X), axis=1) for k in range(epochs): i = np.random.randint(X.shape[0]) a = [X[i]] for l in range(len(self.weights)): dot_value = np.dot(a[l], self.weights[l]) activation = self.activation(dot_value) a.append(activation) # output layer error = y[i] - a[-1] deltas = [error * self.activation_prime(a[-1])] # we need to begin at the second to last layer # (a layer before the output layer) for l in range(len(a) - 2, 0, -1): deltas.append(deltas[-1].dot(self.weights[l].T)*self.activation_prime(a[l])) # reverse # [level3(output)->level2(hidden)] => [level2(hidden)->level3(output)] deltas.reverse() # backpropagation # 1. Multiply its output delta and input activation # to get the gradient of the weight. # 2. Subtract a ratio (percentage) of the gradient from the weight. for i in range(len(self.weights)): layer = np.atleast_2d(a[i]) delta = np.atleast_2d(deltas[i]) self.weights[i] += learning_rate * layer.T.dot(delta) def predict(self, x): a = np.concatenate((np.ones(1).T, np.array(x))) for l in range(0, len(self.weights)): a = self.activation(np.dot(a, self.weights[l])) return a# Create neural net, 13 inputs, 20 hidden nodes, 3 outputsnn = NeuralNetwork([13, 20, 3])data = Process.readdata('wine')# Split data out into input and outputX = data[0]y = data[1]# Normalise input data between 0 and 1.X -= X.min()X /= X.max()# Split data into training and test sets (15% testing)X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15)# Create binay output formy_ = LabelBinarizer().fit_transform(y_train)# Train datalrate = 0.01epoch = 1000nn.fit(X_train, y_, lrate, epoch)# Test dataerr = []for e in X_test: # Create array of output data (argmax to get classification) err.append(np.argmax(nn.predict(e)))# Hide warnings. UndefinedMetricWarning thrown when confusion matrix returns 0 in any one of the classifiers.warnings.filterwarnings('ignore')# Produce confusion matrix and classification reportprint(confusion_matrix(y_test, err))print(classification_report(y_test, err))# Plot actual and predicted dataplt.figure(figsize=(10, 8))target, = plt.plot(y_test, color='b', linestyle='-', lw=1, label='Target')estimated, = plt.plot(err, color='r', linestyle='--', lw=3, label='Estimated')plt.legend(handles=[target, estimated])plt.xlabel('# Samples')plt.ylabel('Classification Value')plt.grid()plt.show()
Process.py
import csvimport numpy as np# Add constant column of 1'sdef addones(arrayvar): return np.hstack((np.ones((arrayvar.shape[0], 1)), arrayvar))def readdata(loc): # Open file and calculate the number of columns and the number of rows. The number of rows has a +1 as the 'next' # operator in num_cols has already pasted over the first row. with open(loc + '.input.csv') as f: file = csv.reader(f, delimiter=',', skipinitialspace=True) num_cols = len(next(file)) num_rows = len(list(file))+1 # Create a zero'd array based on the number of column and rows previously found. x = np.zeros((num_rows, num_cols)) y = np.zeros(num_rows) # INPUT # # Loop through the input file and put each row into a new row of 'samples' with open(loc + '.input.csv', newline='') as csvfile: file = csv.reader(csvfile, delimiter=',') count = 0 for row in file: x[count] = row count += 1 # OUTPUT # # Do the same and loop through the output file. with open(loc + '.output.csv', newline='') as csvfile: file = csv.reader(csvfile, delimiter=',') count = 0 for row in file: y[count] = row[0] count += 1 # Set data type x = np.array(x).astype(np.float) y = np.array(y).astype(np.int) return x, y
MATLAB 脚本
%% LOAD DATA [x1,t1] = wine_dataset;%% SET UP NN net = patternnet(20); net.trainFcn = 'traingd'; net.layers{2}.transferFcn = 'logsig'; net.derivFcn = 'logsig';%% TRAIN AND TEST[net,tr] = train(net,x1,t1);
回答:
我认为我已经找到了问题所在。这是一个数据集本身的问题(并非所有数据集都会出现这个问题)以及我缩放数据的方式。我最初的缩放方法,将结果处理在0到1之间,并没有帮助解决问题,反而导致了所见的糟糕结果:
# 将输入数据标准化到0到1之间X -= X.min()X /= X.max()
我找到了另一种缩放方法,由 sklearn 预处理包提供:
from sklearn import preprocessingX = preprocessing.scale(X)
这种缩放方法不在0到1之间,我需要进一步调查为什么它有如此大的帮助,但现在结果的准确率在96%到100%之间。与 MATLAB 的结果非常接近,我认为 MATLAB 使用了类似的(或相同的)预处理缩放方法。
正如我上面所说,这并非所有数据集的情况。使用 sklearn 内置的 iris 或 digit 数据集似乎在没有缩放的情况下也能产生良好的结果。