我正在尝试学习Deeplearning4j库。我试图使用Sigmoid激活函数实现一个简单的3层神经网络来解决XOR问题。我缺少哪些配置或超参数?我已经成功使用了在线找到的一些MLP示例中的RELU激活函数和Softmax输出得到了准确的输出,但是使用Sigmoid激活函数似乎无法准确拟合。能否有人分享一下为什么我的网络无法产生正确输出?
DenseLayer inputLayer = new DenseLayer.Builder() .nIn(2) .nOut(3) .name("Input") .weightInit(WeightInit.ZERO) .build(); DenseLayer hiddenLayer = new DenseLayer.Builder() .nIn(3) .nOut(3) .name("Hidden") .activation(Activation.SIGMOID) .weightInit(WeightInit.ZERO) .build(); OutputLayer outputLayer = new OutputLayer.Builder() .nIn(3) .nOut(1) .name("Output") .activation(Activation.SIGMOID) .weightInit(WeightInit.ZERO) .lossFunction(LossFunction.MEAN_SQUARED_LOGARITHMIC_ERROR) .build(); NeuralNetConfiguration.Builder nncBuilder = new NeuralNetConfiguration.Builder(); nncBuilder.iterations(10000); nncBuilder.learningRate(0.01); nncBuilder.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT); NeuralNetConfiguration.ListBuilder listBuilder = nncBuilder.list(); listBuilder.layer(0, inputLayer); listBuilder.layer(1, hiddenLayer); listBuilder.layer(2, outputLayer); listBuilder.backprop(true); MultiLayerNetwork myNetwork = new MultiLayerNetwork(listBuilder.build()); myNetwork.init(); INDArray trainingInputs = Nd4j.zeros(4, inputLayer.getNIn()); INDArray trainingOutputs = Nd4j.zeros(4, outputLayer.getNOut()); // If 0,0 show 0 trainingInputs.putScalar(new int[]{0,0}, 0); trainingInputs.putScalar(new int[]{0,1}, 0); trainingOutputs.putScalar(new int[]{0,0}, 0); // If 0,1 show 1 trainingInputs.putScalar(new int[]{1,0}, 0); trainingInputs.putScalar(new int[]{1,1}, 1); trainingOutputs.putScalar(new int[]{1,0}, 1); // If 1,0 show 1 trainingInputs.putScalar(new int[]{2,0}, 1); trainingInputs.putScalar(new int[]{2,1}, 0); trainingOutputs.putScalar(new int[]{2,0}, 1); // If 1,1 show 0 trainingInputs.putScalar(new int[]{3,0}, 1); trainingInputs.putScalar(new int[]{3,1}, 1); trainingOutputs.putScalar(new int[]{3,0}, 0); DataSet myData = new DataSet(trainingInputs, trainingOutputs); myNetwork.fit(myData); INDArray actualInput = Nd4j.zeros(1,2); actualInput.putScalar(new int[]{0,0}, 0); actualInput.putScalar(new int[]{0,1}, 0); INDArray actualOutput = myNetwork.output(actualInput); System.out.println("myNetwork Output " + actualOutput); //Output is producing 1.00. Should be 0.0
回答:
总的来说,我将链接给你: https://deeplearning4j.org/troubleshootingneuralnets
一些具体的建议。永远不要使用零权重初始化,我们在示例中不使用它是有原因的(我强烈建议你从这些示例开始,而不是从头开始):https://github.com/deeplearning4j/dl4j-examples
对于输出层,如果你试图学习XOR,为什么不直接使用二元交叉熵呢:https://github.com/deeplearning4j/dl4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/feedforward/xor/XorExample.java
值得注意的是,也要关闭小批量处理(参见上面的示例),详见:https://deeplearning4j.org/toyproblems