使用theano实现的MLP分类器陷入局部最小值

我使用theano编写了一个MLP分类器。使用反向传播算法的训练函数如下所示:

self.weights=[theano.shared(numpy.random.random((network.architecture[i+1],network.architecture[i]))) for i in range(len(network.architecture)-1)]self.bias=[theano.shared(numpy.random.random(network.architecture[i+1])) for i in range(len(network.architecture)-1)]self.layers=network.layersself.prev_rate=[theano.shared(numpy.zeros((network.architecture[i+1],network.architecture[i]))) for i in range(len(network.architecture)-1)]+[theano.shared(numpy.zeros(network.architecture[i+1])) for i in range(len(network.architecture)-1)]prediction=T.dmatrix()output=T.dmatrix()reg_lambda=T.dscalar()alpha=T.dscalar()momentum=T.dscalar()cost=T.nnet.categorical_crossentropy(prediction,output).mean()for i,j in zip(self.weights,self.bias):    cost+=T.sum(i**2)*reg_lambda    cost+=T.sum(j**2)*reg_lambdaparameters=self.weights+self.biasrates=[(alpha*T.grad(cost,parameter)+momentum*prev_rate) for parameter,prev_rate in zip(parameters,self.prev_rate)]updates=[(weight,weight-rate) for weight,rate in zip(parameters,rates)]+[(prev_rate,rate) for prev_rate,rate in zip(self.prev_rate,rates)]self.backprop=theano.function([prediction,output,reg_lambda,alpha,momentum],cost,updates=updates)

我尝试对XOR问题进行训练,实现在于:

network=FeedForwardNetwork([2,2,2])network.initialize()network.train(numpy.array([[0.,0.],[0.,1.],[1.,0.],[1.,1.],[0.,0.],[0.,1.],[1.,0.],[1.,1.]]),numpy.array([[0.,1.],[1.,0.],[1.,0.],[0.,1.],[0.,1.],[1.,0.],[1.,0.],[0.,1.]]),alpha=0.01,epochs=1000000000000000,momentum=0.9)print network.predict(numpy.array([[1.,0.]]))print network.predict(numpy.array([[0.,0.]]))

initialize()方法只是在后端编译所有函数,即反向传播函数、前向传递函数用于计算预测以及其他几个theano函数。现在,当我运行这段代码时,训练过程停滞在局部最小值。

0.693147180560.693147180560.693147180560.693147180560.693147180560.693147180560.693147180560.693147180560.693147180560.693147180560.693147180560.693147180560.693147180560.693147180560.693147180560.693147180560.693147180560.693147180560.69314718056

在训练开始时,损失大约为0.92。它稳步下降到上述值并停在那里。我尝试更改alpha和动量的数值。我做错了什么?

附注:完整代码在这里:networks.py

import theanoimport theano.tensor as Timport numpyfrom layers import *from backend import NetworkBackendclass Network:    def __init__(self,architecture):        self.architecture=architecture        self.layers=[]        self.weights=[]        self.bias=[]    def __str__(self):        banner=''        for i in range(len(self.weights)):            banner+=str(self.weights[i])+'\n'            banner+=str(self.bias[i])+'\n'        return bannerclass FeedForwardNetwork(Network):    def initialize(self):        self.layers.append(InputLayer(units=self.architecture[0]))        for i in range(1,len(self.architecture[:-1])):            self.layers.append(SigmoidLayer(units=self.architecture[i]))        self.layers.append(SoftmaxLayer(units=self.architecture[-1]))        self.backend=NetworkBackend(self)    def predict(self,inputs):        return self.backend.activate(inputs)    def train(self,X,y,alpha=100,reg_lambda=0.0001,epochs=10000,momentum=0.9):        cost=1        while cost>0.01 and epochs:            prediction=self.predict(X)            cost=self.backend.backprop(prediction,y,reg_lambda,alpha,momentum)            print cost            epochs-=1if __name__=='__main__':    network=FeedForwardNetwork([2,2,2])    network.initialize()    network.train(numpy.array([[0.,0.],[0.,1.],[1.,0.],[1.,1.],[0.,0.],[0.,1.],[1.,0.],[1.,1.]]),numpy.array([[0.,1.],[1.,0.],[1.,0.],[0.,1.],[0.,1.],[1.,0.],[1.,0.],[0.,1.]]),alpha=0.01,epochs=1000000000000000,momentum=0.9)    print network.predict(numpy.array([[1.,0.]]))    print network.predict(numpy.array([[0.,0.]]))

layers.py

import theanoimport theano.tensor as Timport scipyfrom backend import ComputationBackendclass Layer:    def __init__(self,units):        self.units=units        self.backend=ComputationBackend()    def __str__(self):        banner=self.__class__.__name__        banner+=" Units:%d"%self.units        return bannerclass SigmoidLayer(Layer):    def forwardPass(self,inputs):        return self.backend.sigmoid(inputs)class InputLayer(Layer):    def forwardPass(self,inputs):        return inputsclass SoftmaxLayer(Layer):    def forwardPass(self,inputs):        return self.backend.softmax(inputs)

backend.py

import theanoimport theano.tensor as Timport numpyclass NetworkBackend:    def __init__(self,network):        # 初始化共享变量        self.weights=[theano.shared(numpy.random.random((network.architecture[i+1],network.architecture[i]))) for i in range(len(network.architecture)-1)]        self.bias=[theano.shared(numpy.random.random(network.architecture[i+1])) for i in range(len(network.architecture)-1)]        self.layers=network.layers        self.prev_rate=[theano.shared(numpy.zeros((network.architecture[i+1],network.architecture[i]))) for i in range(len(network.architecture)-1)]+[theano.shared(numpy.zeros(network.architecture[i+1])) for i in range(len(network.architecture)-1)]        # 网络层的激活        inputs=T.dmatrix()        temp=self.layers[0].forwardPass(inputs)        for i in range(1,len(self.layers[:-1])):            temp=self.layers[i].forwardPass(T.dot(temp,self.weights[i-1].transpose())+self.bias[i-1])        output=self.layers[-1].forwardPass(T.dot(temp,self.weights[-1].transpose())+self.bias[-1])        self.activate=theano.function([inputs],output)        prediction=T.dmatrix()        output=T.dmatrix()        reg_lambda=T.dscalar()        alpha=T.dscalar()        momentum=T.dscalar()        cost=T.nnet.categorical_crossentropy(prediction,output).mean()        for i,j in zip(self.weights,self.bias):            cost+=T.sum(i**2)*reg_lambda            cost+=T.sum(j**2)*reg_lambda        parameters=self.weights+self.bias        rates=[(alpha*T.grad(cost,parameter)+momentum*prev_rate) for parameter,prev_rate in zip(parameters,self.prev_rate)]        updates=[(weight,weight-rate) for weight,rate in zip(parameters,rates)]+[(prev_rate,rate) for prev_rate,rate in zip(self.prev_rate,rates)]        self.backprop=theano.function([prediction,output,reg_lambda,alpha,momentum],cost,updates=updates)class ComputationBackend:    def __init__(self):        # sigmoid激活        self.sigmoid=T.nnet.sigmoid        # softmax激活        self.softmax=T.nnet.softmax

回答:

终于搞明白了!在NetworkBackend中,计算成本时,我计算的是期望输出与作为theano函数参数传递的预测之间的交叉熵,而不是使用activate函数计算的预测。因此,theano图不包含前向传递。结果,theano.tensor.grad只能找到正则化函数的梯度,而不能找到实际成本函数的梯度!所以正确的实现应该是:

inputs=T.dmatrix()temp=self.layers[0].forwardPass(inputs)for i in range(1,len(self.layers[:-1])):    temp=self.layers[i].forwardPass(T.dot(temp,self.weights[i-1].transpose())+self.bias[i-1])output=self.layers[-1].forwardPass(T.dot(temp,self.weights[-1].transpose())+self.bias[-1])self.activate=theano.function([inputs],output)label=T.dmatrix()reg_lambda=T.dscalar()alpha=T.dscalar()momentum=T.dscalar()cost=T.nnet.categorical_crossentropy(output,label).mean()for i,j in zip(self.weights,self.bias):    cost+=T.sum(i**2)*reg_lambda    cost+=T.sum(j**2)*reg_lambdaparameters=self.weights+self.biasrates=[(alpha*T.grad(cost,parameter)+momentum*prev_rate) for parameter,prev_rate in zip(parameters,self.prev_rate)]updates=[(weight,weight-rate) for weight,rate in zip(parameters,rates)]+[(prev_rate,rate) for prev_rate,rate in zip(self.prev_rate,rates)]self.backprop=theano.function([inputs,label,reg_lambda,alpha,momentum],cost,updates=updates)

所以,我没有声明一个新的预测矩阵,而是使用训练函数中的输入,并使用与激活函数中相同的方程计算预测。这完成了theano图,现在theano.tensor.grad()可以计算成本函数和正则化的梯度了。

Related Posts

Keras Dense层输入未被展平

这是我的测试代码: from keras import…

无法将分类变量输入随机森林

我有10个分类变量和3个数值变量。我在分割后直接将它们…

如何在Keras中对每个输出应用Sigmoid函数?

这是我代码的一部分。 model = Sequenti…

如何选择类概率的最佳阈值?

我的神经网络输出是一个用于多标签分类的预测类概率表: …

在Keras中使用深度学习得到不同的结果

我按照一个教程使用Keras中的深度神经网络进行文本分…

‘MatMul’操作的输入’b’类型为float32,与参数’a’的类型float64不匹配

我写了一个简单的TensorFlow代码,但不断遇到T…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注