我一直在尝试用Python实现一个基本的反向传播神经网络,并且已经完成了初始化和训练权重集的编程。然而,在我训练的所有数据集上,误差(均方误差)总是收敛到一个奇怪的数字——误差在进一步的迭代中总是减少,但从未真正接近零。
任何帮助都将不胜感激。
import csvimport numpy as npclass NeuralNetwork:layers = 0shape = Noneweights = []layerIn = []layerOut = []def __init__(self, shape): self.shape = shape self.layers = len(shape) - 1 for i in range(0,self.layers): n = shape[i] m = shape[i+1] self.weights.append(np.random.normal(scale=0.2, size = (m,n+1)))def sgm(self, x): return 1/(1+np.exp(-x))def dersgm(self, x): y = self.sgm(x) return y*(y-1)def run(self, input): self.layerIn = [] self.layerOut = [] for i in range(self.layers): if i == 0: layer = self.weights[0].dot(np.vstack((input.transpose(), np.ones([1,input.shape[0]])))) else: layer = self.weights[i].dot(np.vstack((self.layerOut[-1], np.ones([1,input.shape[0]])))) self.layerIn.append(layer) self.layerOut.append(self.sgm(layer)) return self.layerOut[-1].Tdef backpropogate(self, input, y, learning_rate): deltas = [] y_hat = self.run(input) #Calculate deltas for i in reversed(range(self.layers)): #for last layer if i == self.layers-1: error = y_hat - y msq_error = sum(.5 * ((error) ** 2)) #returns delta, k rows for k inputs, m columns for m nodes deltas.append(error * self.dersgm(y_hat)) else: error = deltas[-1].dot(self.weights[i+1][:,:-1]) deltas.append(self.dersgm(self.layerOut[i]).T * error) #Calculate weight-deltas wdelta = [] ordered_deltas = list(reversed(deltas)) #reverse order because created backwards #returns weight deltas, k rows for k nodes, m columns for m next layer nodes for i in range(self.layers): if i == 0: #add bias input_with_bias = np.vstack((input.T, np.ones(input.shape[0]))) #some over n rows of deltas for n training examples to get one delta for all examples #for all nodes wdelta.append(ordered_deltas[i].T.dot(input_with_bias.T)) else: with_bias = np.vstack((self.layerOut[i-1], np.ones(input.shape[0]))) wdelta.append(ordered_deltas[i].T.dot(with_bias.T)) #update_weights def update_weights(self, weight_deltas, learning_rate): for i in range(self.layers): self.weights[i] = self.weights[i] +\ (learning_rate * weight_deltas[i]) update_weights(self, wdelta, learning_rate) return msq_error #end backpropogatedef train(self, input, target, lr, run_iter): for i in range(run_iter): if i % 100000 == 0: print self.backpropogate(input, target, lr)
回答:
在以下情境中,误差函数不可能为0,因为要使误差函数为0,需要点与曲线完美匹配。
增加神经元的数量肯定会减少误差,因为函数可以具有更复杂和精确的形状。但当你对数据拟合得太好时,会出现一个叫做过拟合的问题,如下图所示。从左到右,曲线要么对数据集拟合不足,几乎正确拟合,然后在右侧过度拟合。
右侧的情景会导致误差为0,但这不是我们想要的,你需要避免这种情况。如何避免呢?
确定网络中神经元数量是否理想(以获得良好的拟合)的最简单方法是通过试错法。将你的数据分为训练数据(80% – 用于训练网络)和测试数据(20% – 仅用于测试已训练的网络)。在仅用训练数据进行训练时,可以绘制测试数据集上的表现。
你还可以使用第三个数据集进行验证,参见:在神经网络中,训练集、验证集和测试集有什么区别?