神经网络似乎在每次执行时都卡在一个输出上

我创建了一个神经网络来估算输入xsin(x)函数。该网络有21个输出神经元(代表数字-1.0, -0.9, …, 0.9, 1.0),使用numpy实现,但它并未学习,我认为我在定义前馈机制时错误地实现了神经元架构。

当我执行代码时,它正确估算的测试数据量大约是48/1000。这恰好是将1000个测试数据点分成21个类别时,每个类别的平均数据点数。观察网络输出,你会发现网络似乎只是开始为每个输入选择一个单一的输出值。例如,无论你给它什么x,它可能会选择-0.5作为y的估算值。我在这里哪里出错了?这是我的第一个网络。谢谢!

import randomimport numpy as npimport mathclass Network(object):def __init__(self,inputLayerSize,hiddenLayerSize,outputLayerSize):    #Create weight vector arrays to represent each layer size and initialize indices randomly on a Gaussian distribution.    self.layer1 = np.random.randn(hiddenLayerSize,inputLayerSize)    self.layer1_activations = np.zeros((hiddenLayerSize, 1))    self.layer2 = np.random.randn(outputLayerSize,hiddenLayerSize)    self.layer2_activations = np.zeros((outputLayerSize, 1))    self.outputLayerSize = outputLayerSize    self.inputLayerSize = inputLayerSize    self.hiddenLayerSize = hiddenLayerSize    # print(self.layer1)    # print()    # print(self.layer2)    # self.weights = [np.random.randn(y,x)    #                 for x, y in zip(sizes[:-1], sizes[1:])]def feedforward(self, network_input):    #Propogate forward through network as if doing this by hand.    #first layer's output activations:    for neuron in range(self.hiddenLayerSize):        self.layer1_activations[neuron] = 1/(1+np.exp(network_input * self.layer1[neuron]))    #second layer's output activations use layer1's activations as input:    for neuron in range(self.outputLayerSize):        for weight in range(self.hiddenLayerSize):            self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]        self.layer2_activations[neuron] = 1/(1+np.exp(self.layer2_activations[neuron]))    #convert layer 2 activation numbers to a single output. The neuron (weight vector) with highest activation will be output.    outputs = [x / 10 for x in range(-int((self.outputLayerSize/2)), int((self.outputLayerSize/2))+1, 1)] #range(-10, 11, 1)    return(outputs[np.argmax(self.layer2_activations)])def train(self, training_pairs, epochs, minibatchsize, learn_rate):    #apply gradient descent    test_data = build_sinx_data(1000)    for epoch in range(epochs):        random.shuffle(training_pairs)        minibatches = [training_pairs[k:k + minibatchsize] for k in range(0, len(training_pairs), minibatchsize)]        for minibatch in minibatches:            loss = 0 #calculate loss for each minibatch            #Begin training            for x, y in minibatch:                network_output = self.feedforward(x)                loss += (network_output - y) ** 2                #adjust weights by abs(loss)*sigmoid(network_output)*(1-sigmoid(network_output)*learn_rate            loss /= (2*len(minibatch))            adjustWeights = loss*(1/(1+np.exp(-network_output)))*(1-(1/(1+np.exp(-network_output))))*learn_rate            self.layer1 += adjustWeights            #print(adjustWeights)            self.layer2 += adjustWeights            #when line 63 placed here, results did not improve during minibatch.        print("Epoch {0}: {1}/{2} correct".format(epoch, self.evaluate(test_data), len(test_data)))    print("Training Complete")def evaluate(self, test_data):    """    Returns number of test inputs which network evaluates correctly.    The ouput assumed to be neuron in output layer with highest activation    :param test_data: test data set identical in form to train data set.    :return: integer sum    """    correct = 0    for x, y in test_data:        output = self.feedforward(x)        if output == y:            correct+=1    return(correct)def build_sinx_data(data_points):"""Creates a list of tuples (x value, expected y value) for Sin(x) function.:param data_points: number of desired data points:return: list of tuples (x value, expected y value"""x_vals = []y_vals = []for i in range(data_points):    #parameter of randint signifies range of x values to be used*10    x_vals.append(random.randint(-2000,2000)/10)    y_vals.append(round(math.sin(x_vals[i]),1))return (list(zip(x_vals,y_vals)))# training_pairs, epochs, minibatchsize, learn_ratesinx_test = Network(1,21,21)print(sinx_test.feedforward(10))sinx_test.train(build_sinx_data(600),20,10,2)print(sinx_test.feedforward(10))

回答:

我没有彻底检查你的所有代码,但一些问题显而易见:

  • *运算符在numpy中不执行矩阵乘法,你必须使用numpy.dot。这影响了例如这些行:network_input * self.layer1[neuron]self.layer1_activations[weight]*self.layer2[neuron][weight]等。

  • 看起来你通过分类(从21个类别中选择一个)来解决你的问题,但使用了L2损失。这有些混淆。你有两个选择:要么坚持使用分类并使用交叉熵损失函数,要么使用L2损失进行回归(即预测数值)。

  • 你应该提取sigmoid函数以避免重复编写相同的表达式:

    def sigmoid(z):  return 1 / (1 + np.exp(-z))def sigmoid_derivative(x):  return sigmoid(x) * (1 - sigmoid(x))
  • 你对self.layer1self.layer2执行了相同的更新,这显然是错误的。花些时间分析反向传播是如何工作的

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注