我创建了一个神经网络来估算输入x
的sin(x)
函数。该网络有21个输出神经元(代表数字-1.0, -0.9, …, 0.9, 1.0),使用numpy实现,但它并未学习,我认为我在定义前馈机制时错误地实现了神经元架构。
当我执行代码时,它正确估算的测试数据量大约是48/1000。这恰好是将1000个测试数据点分成21个类别时,每个类别的平均数据点数。观察网络输出,你会发现网络似乎只是开始为每个输入选择一个单一的输出值。例如,无论你给它什么x
,它可能会选择-0.5作为y
的估算值。我在这里哪里出错了?这是我的第一个网络。谢谢!
import randomimport numpy as npimport mathclass Network(object):def __init__(self,inputLayerSize,hiddenLayerSize,outputLayerSize): #Create weight vector arrays to represent each layer size and initialize indices randomly on a Gaussian distribution. self.layer1 = np.random.randn(hiddenLayerSize,inputLayerSize) self.layer1_activations = np.zeros((hiddenLayerSize, 1)) self.layer2 = np.random.randn(outputLayerSize,hiddenLayerSize) self.layer2_activations = np.zeros((outputLayerSize, 1)) self.outputLayerSize = outputLayerSize self.inputLayerSize = inputLayerSize self.hiddenLayerSize = hiddenLayerSize # print(self.layer1) # print() # print(self.layer2) # self.weights = [np.random.randn(y,x) # for x, y in zip(sizes[:-1], sizes[1:])]def feedforward(self, network_input): #Propogate forward through network as if doing this by hand. #first layer's output activations: for neuron in range(self.hiddenLayerSize): self.layer1_activations[neuron] = 1/(1+np.exp(network_input * self.layer1[neuron])) #second layer's output activations use layer1's activations as input: for neuron in range(self.outputLayerSize): for weight in range(self.hiddenLayerSize): self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight] self.layer2_activations[neuron] = 1/(1+np.exp(self.layer2_activations[neuron])) #convert layer 2 activation numbers to a single output. The neuron (weight vector) with highest activation will be output. outputs = [x / 10 for x in range(-int((self.outputLayerSize/2)), int((self.outputLayerSize/2))+1, 1)] #range(-10, 11, 1) return(outputs[np.argmax(self.layer2_activations)])def train(self, training_pairs, epochs, minibatchsize, learn_rate): #apply gradient descent test_data = build_sinx_data(1000) for epoch in range(epochs): random.shuffle(training_pairs) minibatches = [training_pairs[k:k + minibatchsize] for k in range(0, len(training_pairs), minibatchsize)] for minibatch in minibatches: loss = 0 #calculate loss for each minibatch #Begin training for x, y in minibatch: network_output = self.feedforward(x) loss += (network_output - y) ** 2 #adjust weights by abs(loss)*sigmoid(network_output)*(1-sigmoid(network_output)*learn_rate loss /= (2*len(minibatch)) adjustWeights = loss*(1/(1+np.exp(-network_output)))*(1-(1/(1+np.exp(-network_output))))*learn_rate self.layer1 += adjustWeights #print(adjustWeights) self.layer2 += adjustWeights #when line 63 placed here, results did not improve during minibatch. print("Epoch {0}: {1}/{2} correct".format(epoch, self.evaluate(test_data), len(test_data))) print("Training Complete")def evaluate(self, test_data): """ Returns number of test inputs which network evaluates correctly. The ouput assumed to be neuron in output layer with highest activation :param test_data: test data set identical in form to train data set. :return: integer sum """ correct = 0 for x, y in test_data: output = self.feedforward(x) if output == y: correct+=1 return(correct)def build_sinx_data(data_points):"""Creates a list of tuples (x value, expected y value) for Sin(x) function.:param data_points: number of desired data points:return: list of tuples (x value, expected y value"""x_vals = []y_vals = []for i in range(data_points): #parameter of randint signifies range of x values to be used*10 x_vals.append(random.randint(-2000,2000)/10) y_vals.append(round(math.sin(x_vals[i]),1))return (list(zip(x_vals,y_vals)))# training_pairs, epochs, minibatchsize, learn_ratesinx_test = Network(1,21,21)print(sinx_test.feedforward(10))sinx_test.train(build_sinx_data(600),20,10,2)print(sinx_test.feedforward(10))
回答:
我没有彻底检查你的所有代码,但一些问题显而易见:
-
*
运算符在numpy中不执行矩阵乘法,你必须使用numpy.dot
。这影响了例如这些行:network_input * self.layer1[neuron]
,self.layer1_activations[weight]*self.layer2[neuron][weight]
等。 -
看起来你通过分类(从21个类别中选择一个)来解决你的问题,但使用了L2损失。这有些混淆。你有两个选择:要么坚持使用分类并使用交叉熵损失函数,要么使用L2损失进行回归(即预测数值)。
-
你应该提取
sigmoid
函数以避免重复编写相同的表达式:def sigmoid(z): return 1 / (1 + np.exp(-z))def sigmoid_derivative(x): return sigmoid(x) * (1 - sigmoid(x))
-
你对
self.layer1
和self.layer2
执行了相同的更新,这显然是错误的。花些时间分析反向传播是如何工作的。