我有一个如下所示已经训练过的神经网络,它能够工作,或者至少看起来是可以工作的,但问题出在训练上。我试图训练它作为一个或门,但它似乎永远达不到预期的效果,输出通常看起来像这样:
prior to training: [[0.50181624] [0.50183743] [0.50180414] [0.50182533]]post training: [[0.69641759] [0.754652 ] [0.75447178] [0.79431198]]expected output: [[0] [1] [1] [1]]
我有这样的损失图表:
奇怪的是,它看起来似乎在训练,但同时又没有达到预期的输出。我知道它永远不会真正达到0和1,但同时我希望它能够处理得更接近预期的输出一些。
我在尝试弄清楚如何反向传播错误时遇到了一些问题,因为我想让这个网络可以有任意数量的隐藏层,所以我在层中存储了局部梯度,与权重一起,并将错误从末端回传。
我怀疑的主要函数是NeuralNetwork.train以及两个前向方法。
import sysimport mathimport numpy as npimport matplotlib.pyplot as pltfrom itertools import productclass NeuralNetwork: class __Layer: def __init__(self,args): self.__epsilon = 1e-6 self.localGrad = 0 self.__weights = np.random.randn( args["previousLayerHeight"], args["height"] )*0.01 self.__biases = np.zeros( (args["biasHeight"],1) ) def __str__(self): return str(self.__weights) def forward(self,X): a = np.dot(X, self.__weights) + self.__biases self.localGrad = np.dot(X.T,self.__sigmoidPrime(a)) return self.__sigmoid(a) def adjustWeights(self, err): self.__weights -= (err * self.__epsilon) def __sigmoid(self, z): return 1/(1 + np.exp(-z)) def __sigmoidPrime(self, a): return self.__sigmoid(a)*(1 - self.__sigmoid(a)) def __init__(self,args): self.__inputDimensions = args["inputDimensions"] self.__outputDimensions = args["outputDimensions"] self.__hiddenDimensions = args["hiddenDimensions"] self.__layers = [] self.__constructLayers() def __constructLayers(self): self.__layers.append( self.__Layer( { "biasHeight": self.__inputDimensions[0], "previousLayerHeight": self.__inputDimensions[1], "height": self.__hiddenDimensions[0][0] if len(self.__hiddenDimensions) > 0 else self.__outputDimensions[0] } ) ) for i in range(len(self.__hiddenDimensions)): self.__layers.append( self.__Layer( { "biasHeight": self.__hiddenDimensions[i + 1][0] if i + 1 < len(self.__hiddenDimensions) else self.__outputDimensions[0], "previousLayerHeight": self.__hiddenDimensions[i][0], "height": self.__hiddenDimensions[i + 1][0] if i + 1 < len(self.__hiddenDimensions) else self.__outputDimensions[0] } ) ) def forward(self,X): out = self.__layers[0].forward(X) for i in range(len(self.__layers) - 1): out = self.__layers[i+1].forward(out) return out def train(self,X,Y,loss,epoch=5000000): for i in range(epoch): YHat = self.forward(X) delta = -(Y-YHat) loss.append(sum(Y-YHat)) err = np.sum(np.dot(self.__layers[-1].localGrad,delta.T), axis=1) err.shape = (self.__hiddenDimensions[-1][0],1) self.__layers[-1].adjustWeights(err) i=0 for l in reversed(self.__layers[:-1]): err = np.dot(l.localGrad, err) l.adjustWeights(err) i += 1 def printLayers(self): print("Layers:\n") for l in self.__layers: print(l) print("\n")def main(args): X = np.array([[x,y] for x,y in product([0,1],repeat=2)]) Y = np.array([[0],[1],[1],[1]]) nn = NeuralNetwork( { #(height,width) "inputDimensions": (4,2), "outputDimensions": (1,1), "hiddenDimensions":[ (6,1) ] } ) print("input:\n\n",X,"\n") print("expected output:\n\n",Y,"\n") nn.printLayers() print("prior to training:\n\n",nn.forward(X), "\n") loss = [] nn.train(X,Y,loss) print("post training:\n\n",nn.forward(X), "\n") nn.printLayers() fig,ax = plt.subplots() x = np.array([x for x in range(5000000)]) loss = np.array(loss) ax.plot(x,loss) ax.set(xlabel="epoch",ylabel="loss",title="logic gate training") plt.show()if(__name__=="__main__"): main(sys.argv[1:])
请问有人能指出我哪里做错了,我强烈怀疑这与我处理矩阵的方式有关,但同时我完全不知道发生了什么。
感谢您花时间阅读我的问题,并花时间回复(如果相关)。
编辑:实际上有很多问题,但我仍然有点困惑如何修复它。虽然损失图表看起来像是在训练,而且某种程度上确实是在训练,但我在上面做的数学计算是错误的。
看看训练函数。
def train(self,X,Y,loss,epoch=5000000): for i in range(epoch): YHat = self.forward(X) delta = -(Y-YHat) loss.append(sum(Y-YHat)) err = np.sum(np.dot(self.__layers[-1].localGrad,delta.T), axis=1) err.shape = (self.__hiddenDimensions[-1][0],1) self.__layers[-1].adjustWeights(err) i=0 for l in reversed(self.__layers[:-1]): err = np.dot(l.localGrad, err) l.adjustWeights(err) i += 1
注意我是如何得到delta = -(Y-Yhat)然后与最后一层的“局部梯度”进行点积的。“局部梯度”是局部W梯度。
def forward(self,X): a = np.dot(X, self.__weights) + self.__biases self.localGrad = np.dot(X.T,self.__sigmoidPrime(a)) return self.__sigmoid(a)
我在链式法则中跳过了一步。我应该首先乘以W * sigprime(XW + b),因为那是X的局部梯度,然后再乘以局部W梯度。我尝试过这样做,但我仍然遇到问题,这是新的前向方法(注意层需要为新变量初始化__init__,我还将激活函数改成了tanh)
def forward(self, X): a = np.dot(X, self.__weights) + self.__biases self.localPartialGrad = self.__tanhPrime(a) self.localWGrad = np.dot(X.T, self.localPartialGrad) self.localXGrad = np.dot(self.localPartialGrad,self.__weights.T) return self.__tanh(a)
并更新了训练方法,大致如下:
def train(self, X, Y, loss, epoch=5000): for e in range(epoch): Yhat = self.forward(X) err = -(Y-Yhat) loss.append(sum(err)) print("loss:\n",sum(err)) for l in self.__layers[::-1]: l.adjustWeights(err) if(l != self.__layers[0]): err = np.multiply(err,l.localPartialGrad) err = np.multiply(err,l.localXGrad)
我得到的新图表到处都是,我完全不知道发生了什么。这是最后改动的代码片段:
def adjustWeights(self, err): perr = np.multiply(err, self.localPartialGrad) werr = np.sum(np.dot(self.__weights,perr.T),axis=1) werr = werr * self.__epsilon werr.shape = (self.__weights.shape[0],1) self.__weights = self.__weights - werr
回答: