梯度下降人工神经网络 – MATLAB 做了什么是我没有做的？

我正在尝试使用 Python 通过梯度下降反向传播来重建一个简单的多层感知器人工神经网络。我的目标是尝试重现 MATLAB 的 ANN 产生的准确率，但我的结果甚至连接近都没有。我使用了与 MATLAB 相同的参数；相同数量的隐藏节点（20），1000个周期，学习率（alpha）为0.01，以及相同的数据（显然），但我的代码在改善结果方面毫无进展，而 MATLAB 的准确率达到了98%左右。

我尝试通过 MATLAB 进行调试以了解它在做什么，但收效甚微。我认为 MATLAB 将输入数据缩放到0到1之间，并在输入中添加偏置，我在 Python 代码中也使用了这些方法。

MATLAB 做了什么导致结果如此之高？或者，更可能的是，我在 Python 代码中做了什么导致结果如此糟糕？我能想到的只有权重的初始值不佳，数据读取错误，或数据处理不当，或激活函数不正确/不佳（我也尝试过使用 tanh，结果相同）。

下面是我的尝试，基于我在网上找到的代码并稍作修改以读取我的数据，而 MATLAB 脚本（仅11行代码）则在其下方。最下面是数据集的链接（我也是通过 MATLAB 获得的）：

感谢任何帮助。

Main.py

import numpy as npimport Processimport matplotlib.pyplot as pltfrom sklearn.metrics import confusion_matrix, classification_reportfrom sklearn.cross_validation import train_test_splitfrom sklearn.preprocessing import LabelBinarizerimport warningsdef sigmoid(x):    return 1.0/(1.0 + np.exp(-x))def sigmoid_prime(x):    return sigmoid(x)*(1.0-sigmoid(x))class NeuralNetwork:    def __init__(self, layers):        self.activation = sigmoid        self.activation_prime = sigmoid_prime        # Set weights        self.weights = []        # layers = [2,2,1]        # range of weight values (-1,1)        # input and hidden layers - random((2+1, 2+1)) : 3 x 3        for i in range(1, len(layers) - 1):            r = 2*np.random.random((layers[i-1] + 1, layers[i] + 1)) - 1            self.weights.append(r)        # output layer - random((2+1, 1)) : 3 x 1        r = 2*np.random.random((layers[i] + 1, layers[i+1])) - 1        self.weights.append(r)    def fit(self, X, y, learning_rate, epochs):        # Add column of ones to X        # This is to add the bias unit to the input layer        ones = np.atleast_2d(np.ones(X.shape[0]))        X = np.concatenate((ones.T, X), axis=1)        for k in range(epochs):            i = np.random.randint(X.shape[0])            a = [X[i]]            for l in range(len(self.weights)):                    dot_value = np.dot(a[l], self.weights[l])                    activation = self.activation(dot_value)                    a.append(activation)            # output layer            error = y[i] - a[-1]            deltas = [error * self.activation_prime(a[-1])]            # we need to begin at the second to last layer            # (a layer before the output layer)            for l in range(len(a) - 2, 0, -1):                deltas.append(deltas[-1].dot(self.weights[l].T)*self.activation_prime(a[l]))            # reverse            # [level3(output)->level2(hidden)]  => [level2(hidden)->level3(output)]            deltas.reverse()            # backpropagation            # 1. Multiply its output delta and input activation            #    to get the gradient of the weight.            # 2. Subtract a ratio (percentage) of the gradient from the weight.            for i in range(len(self.weights)):                layer = np.atleast_2d(a[i])                delta = np.atleast_2d(deltas[i])                self.weights[i] += learning_rate * layer.T.dot(delta)    def predict(self, x):        a = np.concatenate((np.ones(1).T, np.array(x)))        for l in range(0, len(self.weights)):            a = self.activation(np.dot(a, self.weights[l]))        return a# Create neural net, 13 inputs, 20 hidden nodes, 3 outputsnn = NeuralNetwork([13, 20, 3])data = Process.readdata('wine')# Split data out into input and outputX = data[0]y = data[1]# Normalise input data between 0 and 1.X -= X.min()X /= X.max()# Split data into training and test sets (15% testing)X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15)# Create binay output formy_ = LabelBinarizer().fit_transform(y_train)# Train datalrate = 0.01epoch = 1000nn.fit(X_train, y_, lrate, epoch)# Test dataerr = []for e in X_test:    # Create array of output data (argmax to get classification)    err.append(np.argmax(nn.predict(e)))# Hide warnings. UndefinedMetricWarning thrown when confusion matrix returns 0 in any one of the classifiers.warnings.filterwarnings('ignore')# Produce confusion matrix and classification reportprint(confusion_matrix(y_test, err))print(classification_report(y_test, err))# Plot actual and predicted dataplt.figure(figsize=(10, 8))target, = plt.plot(y_test, color='b', linestyle='-', lw=1, label='Target')estimated, = plt.plot(err, color='r', linestyle='--', lw=3, label='Estimated')plt.legend(handles=[target, estimated])plt.xlabel('# Samples')plt.ylabel('Classification Value')plt.grid()plt.show()

Process.py

import csvimport numpy as np# Add constant column of 1'sdef addones(arrayvar):    return np.hstack((np.ones((arrayvar.shape[0], 1)), arrayvar))def readdata(loc):    # Open file and calculate the number of columns and the number of rows. The number of rows has a +1 as the 'next'    # operator in num_cols has already pasted over the first row.    with open(loc + '.input.csv') as f:        file = csv.reader(f, delimiter=',', skipinitialspace=True)        num_cols = len(next(file))        num_rows = len(list(file))+1    # Create a zero'd array based on the number of column and rows previously found.    x = np.zeros((num_rows, num_cols))    y = np.zeros(num_rows)    # INPUT #    # Loop through the input file and put each row into a new row of 'samples'    with open(loc + '.input.csv', newline='') as csvfile:        file = csv.reader(csvfile, delimiter=',')        count = 0        for row in file:            x[count] = row            count += 1    # OUTPUT #    # Do the same and loop through the output file.    with open(loc + '.output.csv', newline='') as csvfile:        file = csv.reader(csvfile, delimiter=',')        count = 0        for row in file:            y[count] = row[0]            count += 1    # Set data type    x = np.array(x).astype(np.float)    y = np.array(y).astype(np.int)    return x, y

MATLAB 脚本

%% LOAD DATA [x1,t1] = wine_dataset;%% SET UP NN net = patternnet(20); net.trainFcn = 'traingd'; net.layers{2}.transferFcn = 'logsig'; net.derivFcn = 'logsig';%% TRAIN AND TEST[net,tr] = train(net,x1,t1);

数据文件可在此处下载：输入输出

回答：

我认为我已经找到了问题所在。这是一个数据集本身的问题（并非所有数据集都会出现这个问题）以及我缩放数据的方式。我最初的缩放方法，将结果处理在0到1之间，并没有帮助解决问题，反而导致了所见的糟糕结果：

# 将输入数据标准化到0到1之间X -= X.min()X /= X.max()

我找到了另一种缩放方法，由 sklearn 预处理包提供：

from sklearn import preprocessingX = preprocessing.scale(X)

这种缩放方法不在0到1之间，我需要进一步调查为什么它有如此大的帮助，但现在结果的准确率在96%到100%之间。与 MATLAB 的结果非常接近，我认为 MATLAB 使用了类似的（或相同的）预处理缩放方法。

正如我上面所说，这并非所有数据集的情况。使用 sklearn 内置的 iris 或 digit 数据集似乎在没有缩放的情况下也能产生良好的结果。

学技术

梯度下降人工神经网络 – MATLAB 做了什么是我没有做的？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复