我无法找出我的神经网络的问题,尽管我已经根据这个例子验证了我的网络,这表明我的反向传播和前向传播运作正常。然而,在训练XOR后,我的网络无论输入如何,输出都大约是0.5。换句话说,网络似乎在尽最大努力最小化误差,但没有看到输入和输出之间的任何相关性。由于单次反向传播似乎运作正常,我的直觉告诉我问题可能出在随后的迭代中。然而,没有明显的问题会导致这种情况,这让我非常困惑。
我查看了其他线程中出现的类似问题,但大多数时候他们的错误要么是他们设置网络的方式非常特殊,要么是他们的参数如学习率或轮数设置得非常不合理。有人熟悉这种情况吗?
public class Net{int[] sizes;double LEARNING_RATE;double[][][] weights;double[][] bias;Random rand = new Random(); //53489085public Net(int[] sizes_, double LEARNING_RATE_){ LEARNING_RATE = LEARNING_RATE_; sizes = sizes_; int numInputs = sizes[0]; double range = 1.0 / Math.sqrt(numInputs); bias = new double[sizes.length - 1][]; weights = new double[sizes.length - 1][][]; for(int w_layer = 0; w_layer < weights.length; w_layer++) { bias[w_layer] = new double[sizes[w_layer+1]]; weights[w_layer] = new double[sizes[w_layer+1]][sizes[w_layer]]; for(int j = 0; j < weights[w_layer].length; j++) { bias[w_layer][j] = 2*range*rand.nextDouble() - range; for(int i = 0; i < weights[w_layer][0].length; i++) { weights[w_layer][j][i] = 2*range*rand.nextDouble() - range; } } }}public double[] evaluate(double[] image_vector){ return forwardPass(image_vector)[sizes.length-1];}public double totalError(double[][] expec, double[][] actual){ double sum = 0; for(int i = 0; i < expec.length; i++) { sum += error(expec[i], evaluate(actual[i])); } return sum / expec.length;}private double error(double[] expec, double[] actual){ double sum = 0; for(int i = 0; i < expec.length; i++) { double del = expec[i] - actual[i]; sum += 0.5 * del * del; } return sum;}public void backpropagate(double[][] image_vector, double[][] outputs){ double[][][] deltaWeights = new double[weights.length][][]; double[][] deltaBias = new double[weights.length][]; for(int w = 0; w < weights.length; w++) { deltaBias[w] = new double[bias[w].length]; deltaWeights[w] = new double[weights[w].length][]; for(int j = 0; j < weights[w].length; j++) { deltaWeights[w][j] = new double[weights[w][j].length]; } } for(int batch = 0; batch < image_vector.length; batch++) { double[][] neuronVals = forwardPass(image_vector[batch]); /* OUTPUT DELTAS */ int w_layer = weights.length-1; double[] deltas = new double[weights[w_layer].length]; for(int j = 0; j < weights[w_layer].length; j++) { double actual = neuronVals[w_layer + 1][j]; double expec = outputs[batch][j]; double deltaErr = actual - expec; double deltaSig = actual * (1 - actual); double delta = deltaErr * deltaSig; deltas[j] = delta; deltaBias[w_layer][j] += delta; for(int i = 0; i < weights[w_layer][0].length; i++) { deltaWeights[w_layer][j][i] += delta * neuronVals[w_layer][i]; } } w_layer--; /* REST OF THE DELTAS */ while(w_layer >= 0) { double[] nextDeltas = new double[weights[w_layer].length]; for(int j = 0; j < weights[w_layer].length; j++) { double outNeur = neuronVals[w_layer+1][j]; double deltaSig = outNeur * (1 - outNeur); double sum = 0; for(int i = 0; i < weights[w_layer+1].length; i++) { sum += weights[w_layer+1][i][j] * deltas[i]; } double delta = sum * deltaSig; nextDeltas[j] = delta; deltaBias[w_layer][j] += delta; for(int i = 0; i < weights[w_layer][0].length; i++) { deltaWeights[w_layer][j][i] += delta * neuronVals[w_layer][i]; } } deltas = nextDeltas; w_layer--; } } for(int w_layer = 0; w_layer < weights.length; w_layer++) { for(int j = 0; j < weights[w_layer].length; j++) { deltaBias[w_layer][j] /= (double) image_vector.length; bias[w_layer][j] -= LEARNING_RATE * deltaBias[w_layer][j]; for(int i = 0; i < weights[w_layer][j].length; i++) { deltaWeights[w_layer][j][i] /= (double) image_vector.length; // average of batches weights[w_layer][j][i] -= LEARNING_RATE * deltaWeights[w_layer][j][i]; } } }}public double[][] forwardPass(double[] image_vector){ double[][] outputs = new double[sizes.length][]; double[] inputs = image_vector; for(int w = 0; w < weights.length; w++) { outputs[w] = inputs; double[] output = new double[weights[w].length]; for(int j = 0; j < weights[w].length; j++) { output[j] = bias[w][j]; for(int i = 0; i < weights[w][j].length; i++) { output[j] += weights[w][j][i] * inputs[i]; } output[j] = sigmoid(output[j]); } inputs = output; } outputs[outputs.length-1] = inputs.clone(); return outputs;}static public double sigmoid(double val){ return 1.0 / (1.0 + Math.exp(-val));}}
我的XOR类看起来是这样的。考虑到它的简单性,错误不太可能出现在这部分,但我觉得即使我对XOR的工作原理有某种根本性的误解,发布出来也没什么坏处。我的网络设置为以批次的方式处理示例,但如您在下方看到的,对于这个特定示例,我发送的是单个批次,或者说实际上没有使用批次处理。
public class SingleLayer {static int numEpochs = 10000;static double LEARNING_RATE = 0.001;static int[] sizes = new int[] {2, 2, 1};public static void main(String[] args){ System.out.println("Initializing randomly generate neural net..."); Net n = new Net(sizes, LEARNING_RATE); System.out.println("Complete!"); System.out.println("Loading dataset..."); double[][] inputs = new double[4][2]; double[][] outputs = new double[4][1]; inputs[0] = new double[] {1, 1}; outputs[0] = new double[] {0}; inputs[1] = new double[] {1, 0}; outputs[1] = new double[] {1}; inputs[2] = new double[] {0, 1}; outputs[2] = new double[] {1}; inputs[3] = new double[] {0, 0}; outputs[3] = new double[] {0}; System.out.println("Complete!"); System.out.println("STARTING ERROR: " + n.totalError(outputs, inputs)); for(int epoch = 0; epoch < numEpochs; epoch++) { double[][] in = new double[1][2]; double[][] out = new double[1][1]; int num = (int)(Math.random()*inputs.length); in[0] = inputs[num]; out[0] = outputs[num]; n.backpropagate(inputs, outputs); System.out.println("ERROR: " + n.totalError(out, in)); } System.out.println("Prediction After Training: " + n.evaluate(inputs[0])[0] + " Expected: " + outputs[0][0]); System.out.println("Prediction After Training: " + n.evaluate(inputs[1])[0] + " Expected: " + outputs[1][0]); System.out.println("Prediction After Training: " + n.evaluate(inputs[2])[0] + " Expected: " + outputs[2][0]); System.out.println("Prediction After Training: " + n.evaluate(inputs[3])[0] + " Expected: " + outputs[3][0]);}}
有人能提供一些见解,告诉我可能出了什么问题吗?我的参数设置得相当合理,我已经按照所有建议来初始化权重和设置学习率等。谢谢!
回答:
我已经搞明白了。我运行的轮数不够。这对我来说似乎有点傻,但这个可视化让我意识到网络在答案大约为0.5的地方停留了很长时间,然后才将误差降低到小于0.00001。