我最近尝试在 Python 中应用反向传播算法,我尝试了 fmin_tnc 和 bfgs,但都没有成功,请帮助我找出问题所在。
def sigmoid(Z): return 1/(1+np.exp(-Z)) def costFunction(nnparams,X,y,input_layer_size=400,hidden_layer_size=25,num_labels=10,lamda=1): #input_layer_size=400; hidden_layer_size=25; num_labels=10; lamda=1; Theta1=np.reshape(nnparams[0:hidden_layer_size*(input_layer_size+1)],(hidden_layer_size,(input_layer_size+1))) Theta2=np.reshape(nnparams[(hidden_layer_size*(input_layer_size+1)):],(num_labels,hidden_layer_size+1)) m=X.shape[0] J=0; y=y.reshape(m,1) Theta1_grad=np.zeros(Theta1.shape) Theta2_grad=np.zeros(Theta2.shape) X=np.concatenate([np.ones([m,1]),X],1) a2=sigmoid(Theta1.dot(X.T)); a2=np.concatenate([np.ones([1,a2.shape[1]]),a2]) h=sigmoid(Theta2.dot(a2)) c=np.array(range(1,11)) y=y==c; for i in range(y.shape[0]): J=J+(-1/m)*np.sum(y[i,:]*np.log(h[:,i]) + (1-y[i,:])*np.log(1-h[:,i]) ); DEL2=np.zeros(Theta2.shape); DEL1=np.zeros(Theta1.shape); for i in range(m): z2=Theta1.dot(X[i,:].T); a2=sigmoid(z2).reshape(-1,1); a2=np.concatenate([np.ones([1,a2.shape[1]]),a2]) z3=Theta2.dot(a2); # print('z3 shape',z3.shape) a3=sigmoid(z3).reshape(-1,1); # print('a3 shape = ',a3.shape) delta3=(a3-y[i,:].T.reshape(-1,1)); # print('y shape ',y[i,:].T.shape) delta2=((Theta2.T.dot(delta3)) * (a2 * (1-a2))); # print('shapes = ',delta3.shape,a3.shape) DEL2 = DEL2 + delta3.dot(a2.T); DEL1 = DEL1 + (delta2[1,:])*(X[i,:]); Theta1_grad=np.zeros(np.shape(Theta1)); Theta2_grad=np.zeros(np.shape(Theta2)); Theta1_grad[:,0]=(DEL1[:,0] * (1/m)); Theta1_grad[:,1:]=(DEL1[:,1:] * (1/m)) + (lamda/m)*(Theta1[:,1:]); Theta2_grad[:,0]=(DEL2[:,0] * (1/m)); Theta2_grad[:,1:]=(DEL2[:,1:]*(1/m)) + (lamda/m)*(Theta2[:,1:]); grad=np.concatenate([Theta1_grad.reshape(-1,1),Theta2_grad.reshape(-1,1)]); return J,grad
这是我调用函数的方式(op 是 scipy.optimize)
r2=op.minimize(fun=costFunction, x0=nnparams, args=(X, dataY.flatten()), method='TNC', jac=True, options={'maxiter': 400})
r2 的结果如下
fun: 3.1045444063663266 jac: array([[-6.73218494e-04], [-8.93179045e-05], [-1.13786179e-04], ..., [ 1.19577741e-03], [ 5.79555099e-05], [ 3.85717533e-03]]) message: 'Linear search failed' nfev: 140 nit: 5 status: 4 success: False x: array([-0.97996948, -0.44658952, -0.5689309 , ..., 0.03420931, -0.58005183, -0.74322735])
请帮助我找到正确的最小化此函数的方法,提前感谢
回答:
最终解决了问题,问题出在我使用 np.randn() 生成随机 Theta 值,这会生成标准正态分布的随机值,因此由于太多值在相同范围内,导致 Theta 值的对称性。由于这种对称性问题,优化过程在中途终止。简单的解决方案是使用 np.rand()(提供均匀随机分布)来代替 np.randn()。