XOR神经网络收敛到0.5

我无法找出我的神经网络的问题,尽管我已经根据这个例子验证了我的网络,这表明我的反向传播和前向传播运作正常。然而,在训练XOR后,我的网络无论输入如何,输出都大约是0.5。换句话说,网络似乎在尽最大努力最小化误差,但没有看到输入和输出之间的任何相关性。由于单次反向传播似乎运作正常,我的直觉告诉我问题可能出在随后的迭代中。然而,没有明显的问题会导致这种情况,这让我非常困惑。

我查看了其他线程中出现的类似问题,但大多数时候他们的错误要么是他们设置网络的方式非常特殊,要么是他们的参数如学习率或轮数设置得非常不合理。有人熟悉这种情况吗?

public class Net{int[] sizes;double LEARNING_RATE;double[][][] weights;double[][] bias;Random rand = new Random();  //53489085public Net(int[] sizes_, double LEARNING_RATE_){    LEARNING_RATE = LEARNING_RATE_;    sizes = sizes_;    int numInputs = sizes[0];    double range = 1.0 / Math.sqrt(numInputs);    bias = new double[sizes.length - 1][];    weights = new double[sizes.length - 1][][];    for(int w_layer = 0; w_layer < weights.length; w_layer++)    {        bias[w_layer] = new double[sizes[w_layer+1]];        weights[w_layer] = new double[sizes[w_layer+1]][sizes[w_layer]];        for(int j = 0; j < weights[w_layer].length; j++)        {            bias[w_layer][j] = 2*range*rand.nextDouble() - range;            for(int i = 0; i < weights[w_layer][0].length; i++)            {                weights[w_layer][j][i] = 2*range*rand.nextDouble() - range;            }        }    }}public double[] evaluate(double[] image_vector){    return forwardPass(image_vector)[sizes.length-1];}public double totalError(double[][] expec, double[][] actual){    double sum = 0;    for(int i = 0; i < expec.length; i++)    {        sum += error(expec[i], evaluate(actual[i]));    }    return sum / expec.length;}private double error(double[] expec, double[] actual){    double sum = 0;    for(int i = 0; i < expec.length; i++)    {        double del = expec[i] - actual[i];        sum += 0.5 * del * del;    }    return sum;}public void backpropagate(double[][] image_vector, double[][] outputs){    double[][][] deltaWeights = new double[weights.length][][];    double[][] deltaBias = new double[weights.length][];    for(int w = 0; w < weights.length; w++)    {        deltaBias[w] = new double[bias[w].length];        deltaWeights[w] = new double[weights[w].length][];        for(int j = 0; j < weights[w].length; j++)        {            deltaWeights[w][j] = new double[weights[w][j].length];        }    }    for(int batch = 0; batch < image_vector.length; batch++)    {        double[][] neuronVals = forwardPass(image_vector[batch]);        /* OUTPUT DELTAS */        int w_layer = weights.length-1;        double[] deltas = new double[weights[w_layer].length];        for(int j = 0; j < weights[w_layer].length; j++)        {            double actual = neuronVals[w_layer + 1][j];             double expec = outputs[batch][j];            double deltaErr = actual - expec;            double deltaSig = actual * (1 - actual);            double delta = deltaErr * deltaSig;            deltas[j] = delta;            deltaBias[w_layer][j] += delta;            for(int i = 0; i < weights[w_layer][0].length; i++)            {                deltaWeights[w_layer][j][i] += delta * neuronVals[w_layer][i];            }        }        w_layer--;        /* REST OF THE DELTAS */        while(w_layer >= 0)        {               double[] nextDeltas = new double[weights[w_layer].length];            for(int j = 0; j < weights[w_layer].length; j++)            {                double outNeur = neuronVals[w_layer+1][j];                double deltaSig = outNeur * (1 - outNeur);                double sum = 0;                for(int i = 0; i < weights[w_layer+1].length; i++)                {                    sum += weights[w_layer+1][i][j] * deltas[i];                }                double delta = sum * deltaSig;                nextDeltas[j] = delta;                deltaBias[w_layer][j] += delta;                for(int i = 0; i < weights[w_layer][0].length; i++)                {                    deltaWeights[w_layer][j][i] += delta * neuronVals[w_layer][i];                }            }            deltas = nextDeltas;            w_layer--;        }    }    for(int w_layer = 0; w_layer < weights.length; w_layer++)    {        for(int j = 0; j < weights[w_layer].length; j++)        {            deltaBias[w_layer][j] /= (double) image_vector.length;            bias[w_layer][j] -= LEARNING_RATE * deltaBias[w_layer][j];            for(int i = 0; i < weights[w_layer][j].length; i++)            {                   deltaWeights[w_layer][j][i] /= (double) image_vector.length; // average of batches                weights[w_layer][j][i] -= LEARNING_RATE * deltaWeights[w_layer][j][i];            }        }    }}public double[][] forwardPass(double[] image_vector){    double[][] outputs = new double[sizes.length][];    double[] inputs = image_vector;    for(int w = 0; w < weights.length; w++)    {        outputs[w] = inputs;        double[] output = new double[weights[w].length];        for(int j = 0; j < weights[w].length; j++)        {            output[j] = bias[w][j];            for(int i = 0; i < weights[w][j].length; i++)            {                output[j] += weights[w][j][i] * inputs[i];            }            output[j] = sigmoid(output[j]);        }        inputs = output;    }    outputs[outputs.length-1] = inputs.clone();    return outputs;}static public double sigmoid(double val){    return 1.0 / (1.0 + Math.exp(-val));}}

我的XOR类看起来是这样的。考虑到它的简单性,错误不太可能出现在这部分,但我觉得即使我对XOR的工作原理有某种根本性的误解,发布出来也没什么坏处。我的网络设置为以批次的方式处理示例,但如您在下方看到的,对于这个特定示例,我发送的是单个批次,或者说实际上没有使用批次处理。

public class SingleLayer {static int numEpochs = 10000;static double LEARNING_RATE = 0.001;static int[] sizes = new int[] {2, 2, 1};public static void main(String[] args){    System.out.println("Initializing randomly generate neural net...");    Net n = new Net(sizes, LEARNING_RATE);    System.out.println("Complete!");    System.out.println("Loading dataset...");    double[][] inputs = new double[4][2];    double[][] outputs = new double[4][1];    inputs[0] = new double[] {1, 1};    outputs[0] = new double[] {0};    inputs[1] = new double[] {1, 0};    outputs[1] = new double[] {1};    inputs[2] = new double[] {0, 1};    outputs[2] = new double[] {1};    inputs[3] = new double[] {0, 0};    outputs[3] = new double[] {0};    System.out.println("Complete!");    System.out.println("STARTING ERROR: " + n.totalError(outputs, inputs));    for(int epoch = 0; epoch < numEpochs; epoch++)    {        double[][] in = new double[1][2];        double[][] out = new double[1][1];        int num = (int)(Math.random()*inputs.length);        in[0] = inputs[num];        out[0] = outputs[num];        n.backpropagate(inputs, outputs);        System.out.println("ERROR: " + n.totalError(out, in));    }    System.out.println("Prediction After Training: " + n.evaluate(inputs[0])[0] + "  Expected: " + outputs[0][0]);    System.out.println("Prediction After Training: " + n.evaluate(inputs[1])[0] + "  Expected: " + outputs[1][0]);    System.out.println("Prediction After Training: " + n.evaluate(inputs[2])[0] + "  Expected: " + outputs[2][0]);    System.out.println("Prediction After Training: " + n.evaluate(inputs[3])[0] + "  Expected: " + outputs[3][0]);}}

有人能提供一些见解,告诉我可能出了什么问题吗?我的参数设置得相当合理,我已经按照所有建议来初始化权重和设置学习率等。谢谢!


回答:

我已经搞明白了。我运行的轮数不够。这对我来说似乎有点傻,但这个可视化让我意识到网络在答案大约为0.5的地方停留了很长时间,然后才将误差降低到小于0.00001。

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注