这是我的项目。它包括:m = 24,其中m是训练样本的数量;3个隐藏层和输入层;3组连接每一层的权重;数据为1×38,响应为y(1×1)。
import numpy as npx = np.array([[1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0],[1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0],[1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0],[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0],[0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0],[1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0],[1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0],[1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0],[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0],[1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1],[1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1],[1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0],[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0],[1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0],[0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0],[1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1],[1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0],[0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0],[1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0],[0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0],[1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0],[1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0],[1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0],[0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0]])y = np.array([ [1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1,0]]).Tw = np.random.random((38, 39))w2 = np.random.random((39, 39))w3 = np.random.random((39, 1))for j in xrange(100000): a2 = 1/(1 + np.exp(-(np.dot(x, w) + 1))) a3 = 1/(1 + np.exp(-(np.dot(a2, w2) + 1))) a4 = 1/(1 + np.exp(-(np.dot(a3, w3) + 1))) a4delta = (y - a4) * (1 * (1 - a4)) a3delta = a4delta.dot(w3.T) * (1 * (1 - a3)) a2delta = a3delta.dot(w2.T) * (1 * (1 - a2)) w3 += a3.T.dot(a4delta) w2 += a2.T.dot(a3delta) w += x.T.dot(a2delta)print(a4)
结果如下:
[[ 1.][ 1.][ 1.][ 1.][ 1.][ 1.][ 1.][ 1.][ 1.][ 1.][ 1.][ 1.][ 1.][ 1.][ 1.][ 1.][ 1.][ 1.][ 1.][ 1.][ 1.][ 1.][ 1.][ 1.]]
有人能看出我哪里做错了么?我的网络需要更改吗?我已经尝试通过增加更多的隐藏层和更多的内存来调整超参数了。
回答:
你有一些错误,还有一些我认为是错误的地方,但可能只是不同的实现方式。
你在将梯度加到权重上,而你应该是从权重中减去梯度乘以步长。这就是为什么你的权重在仅一次迭代中就上升到1.0。
这些:
w3 += a3.T.dot(a4delta)
应该改成类似这样的:
w3 -= addBias(a3).T.dot(a4delta) * step
另外,我认为你对sigmoid函数的偏导数的公式可能有误。我认为这些:
a3delta = a4delta.dot(w3.T) * (1 * (1 - a3))
应该改成:
a3delta = a4delta.dot(w3.T) * (a3 * (1 - a3))
你还应该将权重初始化在零附近,类似这样:
ep = 0.12w = np.random.random((39, 39)) * 2 * ep - ep
大多数实现都会在每一层添加一个偏置节点,你没有这样做。这会使事情变得稍微复杂一些,但我认为它会使收敛速度更快。
对我来说,这在200次迭代后就收敛到了一个可信的结果:
# 权重有不同的形状以适应偏置节点w = np.random.random((39, 39)) * 2 * ep - epw2 = np.random.random((40, 39))* 2 * ep - epw3 = np.random.random((40, 1)) * 2 * ep - epep = 0.12w = np.random.random((39, 39)) * 2 * ep - epw2 = np.random.random((40, 39))* 2 * ep - epw3 = np.random.random((40, 1)) * 2 * ep - epdef addBias(mat): return np.hstack((np.ones((mat.shape[0], 1)), mat))step = -.1for j in range(200): # 前向传播 a2 = 1/(1 + np.exp(- addBias(x).dot(w))) a3 = 1/(1 + np.exp(- addBias(a2).dot(w2))) a4 = 1/(1 + np.exp(- addBias(a3).dot(w3))) # 反向传播 a4delta = (y - a4) # 这里需要移除偏置节点 a3delta = a4delta.dot(w3[1:,:].T) * (a3 * (1 - a3)) a2delta = a3delta.dot(w2[1:,:].T) * (a2 * (1 - a2)) # 梯度下降 # 先将梯度乘以步长然后减去 w3 -= addBias(a3).T.dot(a4delta) * step w2 -= addBias(a2).T.dot(a3delta) * step w -= addBias(x).T.dot(a2delta) * stepprint(np.rint(a4))