神经网络似乎在每次执行时都卡在一个输出上

我创建了一个神经网络来估算输入x的sin(x)函数。该网络有21个输出神经元（代表数字-1.0, -0.9, …, 0.9, 1.0），使用numpy实现，但它并未学习，我认为我在定义前馈机制时错误地实现了神经元架构。

当我执行代码时，它正确估算的测试数据量大约是48/1000。这恰好是将1000个测试数据点分成21个类别时，每个类别的平均数据点数。观察网络输出，你会发现网络似乎只是开始为每个输入选择一个单一的输出值。例如，无论你给它什么x，它可能会选择-0.5作为y的估算值。我在这里哪里出错了？这是我的第一个网络。谢谢！

import randomimport numpy as npimport mathclass Network(object):def __init__(self,inputLayerSize,hiddenLayerSize,outputLayerSize):    #Create weight vector arrays to represent each layer size and initialize indices randomly on a Gaussian distribution.    self.layer1 = np.random.randn(hiddenLayerSize,inputLayerSize)    self.layer1_activations = np.zeros((hiddenLayerSize, 1))    self.layer2 = np.random.randn(outputLayerSize,hiddenLayerSize)    self.layer2_activations = np.zeros((outputLayerSize, 1))    self.outputLayerSize = outputLayerSize    self.inputLayerSize = inputLayerSize    self.hiddenLayerSize = hiddenLayerSize    # print(self.layer1)    # print()    # print(self.layer2)    # self.weights = [np.random.randn(y,x)    #                 for x, y in zip(sizes[:-1], sizes[1:])]def feedforward(self, network_input):    #Propogate forward through network as if doing this by hand.    #first layer's output activations:    for neuron in range(self.hiddenLayerSize):        self.layer1_activations[neuron] = 1/(1+np.exp(network_input * self.layer1[neuron]))    #second layer's output activations use layer1's activations as input:    for neuron in range(self.outputLayerSize):        for weight in range(self.hiddenLayerSize):            self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]        self.layer2_activations[neuron] = 1/(1+np.exp(self.layer2_activations[neuron]))    #convert layer 2 activation numbers to a single output. The neuron (weight vector) with highest activation will be output.    outputs = [x / 10 for x in range(-int((self.outputLayerSize/2)), int((self.outputLayerSize/2))+1, 1)] #range(-10, 11, 1)    return(outputs[np.argmax(self.layer2_activations)])def train(self, training_pairs, epochs, minibatchsize, learn_rate):    #apply gradient descent    test_data = build_sinx_data(1000)    for epoch in range(epochs):        random.shuffle(training_pairs)        minibatches = [training_pairs[k:k + minibatchsize] for k in range(0, len(training_pairs), minibatchsize)]        for minibatch in minibatches:            loss = 0 #calculate loss for each minibatch            #Begin training            for x, y in minibatch:                network_output = self.feedforward(x)                loss += (network_output - y) ** 2                #adjust weights by abs(loss)*sigmoid(network_output)*(1-sigmoid(network_output)*learn_rate            loss /= (2*len(minibatch))            adjustWeights = loss*(1/(1+np.exp(-network_output)))*(1-(1/(1+np.exp(-network_output))))*learn_rate            self.layer1 += adjustWeights            #print(adjustWeights)            self.layer2 += adjustWeights            #when line 63 placed here, results did not improve during minibatch.        print("Epoch {0}: {1}/{2} correct".format(epoch, self.evaluate(test_data), len(test_data)))    print("Training Complete")def evaluate(self, test_data):    """    Returns number of test inputs which network evaluates correctly.    The ouput assumed to be neuron in output layer with highest activation    :param test_data: test data set identical in form to train data set.    :return: integer sum    """    correct = 0    for x, y in test_data:        output = self.feedforward(x)        if output == y:            correct+=1    return(correct)def build_sinx_data(data_points):"""Creates a list of tuples (x value, expected y value) for Sin(x) function.:param data_points: number of desired data points:return: list of tuples (x value, expected y value"""x_vals = []y_vals = []for i in range(data_points):    #parameter of randint signifies range of x values to be used*10    x_vals.append(random.randint(-2000,2000)/10)    y_vals.append(round(math.sin(x_vals[i]),1))return (list(zip(x_vals,y_vals)))# training_pairs, epochs, minibatchsize, learn_ratesinx_test = Network(1,21,21)print(sinx_test.feedforward(10))sinx_test.train(build_sinx_data(600),20,10,2)print(sinx_test.feedforward(10))

回答：

我没有彻底检查你的所有代码，但一些问题显而易见：

*运算符在numpy中不执行矩阵乘法，你必须使用numpy.dot。这影响了例如这些行：network_input * self.layer1[neuron]，self.layer1_activations[weight]*self.layer2[neuron][weight]等。
看起来你通过分类（从21个类别中选择一个）来解决你的问题，但使用了L2损失。这有些混淆。你有两个选择：要么坚持使用分类并使用交叉熵损失函数，要么使用L2损失进行回归（即预测数值）。

你应该提取sigmoid函数以避免重复编写相同的表达式：

def sigmoid(z):  return 1 / (1 + np.exp(-z))def sigmoid_derivative(x):  return sigmoid(x) * (1 - sigmoid(x))

你对self.layer1和self.layer2执行了相同的更新，这显然是错误的。花些时间分析反向传播是如何工作的。

学技术

神经网络似乎在每次执行时都卡在一个输出上

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复