梯度下降人工神经网络 – MATLAB 做了什么是我没有做的?

我正在尝试使用 Python 通过梯度下降反向传播来重建一个简单的多层感知器人工神经网络。我的目标是尝试重现 MATLAB 的 ANN 产生的准确率,但我的结果甚至连接近都没有。我使用了与 MATLAB 相同的参数;相同数量的隐藏节点(20),1000个周期,学习率(alpha)为0.01,以及相同的数据(显然),但我的代码在改善结果方面毫无进展,而 MATLAB 的准确率达到了98%左右。

我尝试通过 MATLAB 进行调试以了解它在做什么,但收效甚微。我认为 MATLAB 将输入数据缩放到0到1之间,并在输入中添加偏置,我在 Python 代码中也使用了这些方法。

MATLAB 做了什么导致结果如此之高?或者,更可能的是,我在 Python 代码中做了什么导致结果如此糟糕?我能想到的只有权重的初始值不佳,数据读取错误,或数据处理不当,或激活函数不正确/不佳(我也尝试过使用 tanh,结果相同)。

下面是我的尝试,基于我在网上找到的代码并稍作修改以读取我的数据,而 MATLAB 脚本(仅11行代码)则在其下方。最下面是数据集的链接(我也是通过 MATLAB 获得的):

感谢任何帮助。

Main.py

import numpy as npimport Processimport matplotlib.pyplot as pltfrom sklearn.metrics import confusion_matrix, classification_reportfrom sklearn.cross_validation import train_test_splitfrom sklearn.preprocessing import LabelBinarizerimport warningsdef sigmoid(x):    return 1.0/(1.0 + np.exp(-x))def sigmoid_prime(x):    return sigmoid(x)*(1.0-sigmoid(x))class NeuralNetwork:    def __init__(self, layers):        self.activation = sigmoid        self.activation_prime = sigmoid_prime        # Set weights        self.weights = []        # layers = [2,2,1]        # range of weight values (-1,1)        # input and hidden layers - random((2+1, 2+1)) : 3 x 3        for i in range(1, len(layers) - 1):            r = 2*np.random.random((layers[i-1] + 1, layers[i] + 1)) - 1            self.weights.append(r)        # output layer - random((2+1, 1)) : 3 x 1        r = 2*np.random.random((layers[i] + 1, layers[i+1])) - 1        self.weights.append(r)    def fit(self, X, y, learning_rate, epochs):        # Add column of ones to X        # This is to add the bias unit to the input layer        ones = np.atleast_2d(np.ones(X.shape[0]))        X = np.concatenate((ones.T, X), axis=1)        for k in range(epochs):            i = np.random.randint(X.shape[0])            a = [X[i]]            for l in range(len(self.weights)):                    dot_value = np.dot(a[l], self.weights[l])                    activation = self.activation(dot_value)                    a.append(activation)            # output layer            error = y[i] - a[-1]            deltas = [error * self.activation_prime(a[-1])]            # we need to begin at the second to last layer            # (a layer before the output layer)            for l in range(len(a) - 2, 0, -1):                deltas.append(deltas[-1].dot(self.weights[l].T)*self.activation_prime(a[l]))            # reverse            # [level3(output)->level2(hidden)]  => [level2(hidden)->level3(output)]            deltas.reverse()            # backpropagation            # 1. Multiply its output delta and input activation            #    to get the gradient of the weight.            # 2. Subtract a ratio (percentage) of the gradient from the weight.            for i in range(len(self.weights)):                layer = np.atleast_2d(a[i])                delta = np.atleast_2d(deltas[i])                self.weights[i] += learning_rate * layer.T.dot(delta)    def predict(self, x):        a = np.concatenate((np.ones(1).T, np.array(x)))        for l in range(0, len(self.weights)):            a = self.activation(np.dot(a, self.weights[l]))        return a# Create neural net, 13 inputs, 20 hidden nodes, 3 outputsnn = NeuralNetwork([13, 20, 3])data = Process.readdata('wine')# Split data out into input and outputX = data[0]y = data[1]# Normalise input data between 0 and 1.X -= X.min()X /= X.max()# Split data into training and test sets (15% testing)X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15)# Create binay output formy_ = LabelBinarizer().fit_transform(y_train)# Train datalrate = 0.01epoch = 1000nn.fit(X_train, y_, lrate, epoch)# Test dataerr = []for e in X_test:    # Create array of output data (argmax to get classification)    err.append(np.argmax(nn.predict(e)))# Hide warnings. UndefinedMetricWarning thrown when confusion matrix returns 0 in any one of the classifiers.warnings.filterwarnings('ignore')# Produce confusion matrix and classification reportprint(confusion_matrix(y_test, err))print(classification_report(y_test, err))# Plot actual and predicted dataplt.figure(figsize=(10, 8))target, = plt.plot(y_test, color='b', linestyle='-', lw=1, label='Target')estimated, = plt.plot(err, color='r', linestyle='--', lw=3, label='Estimated')plt.legend(handles=[target, estimated])plt.xlabel('# Samples')plt.ylabel('Classification Value')plt.grid()plt.show()

Process.py

import csvimport numpy as np# Add constant column of 1'sdef addones(arrayvar):    return np.hstack((np.ones((arrayvar.shape[0], 1)), arrayvar))def readdata(loc):    # Open file and calculate the number of columns and the number of rows. The number of rows has a +1 as the 'next'    # operator in num_cols has already pasted over the first row.    with open(loc + '.input.csv') as f:        file = csv.reader(f, delimiter=',', skipinitialspace=True)        num_cols = len(next(file))        num_rows = len(list(file))+1    # Create a zero'd array based on the number of column and rows previously found.    x = np.zeros((num_rows, num_cols))    y = np.zeros(num_rows)    # INPUT #    # Loop through the input file and put each row into a new row of 'samples'    with open(loc + '.input.csv', newline='') as csvfile:        file = csv.reader(csvfile, delimiter=',')        count = 0        for row in file:            x[count] = row            count += 1    # OUTPUT #    # Do the same and loop through the output file.    with open(loc + '.output.csv', newline='') as csvfile:        file = csv.reader(csvfile, delimiter=',')        count = 0        for row in file:            y[count] = row[0]            count += 1    # Set data type    x = np.array(x).astype(np.float)    y = np.array(y).astype(np.int)    return x, y

MATLAB 脚本

%% LOAD DATA [x1,t1] = wine_dataset;%% SET UP NN net = patternnet(20); net.trainFcn = 'traingd'; net.layers{2}.transferFcn = 'logsig'; net.derivFcn = 'logsig';%% TRAIN AND TEST[net,tr] = train(net,x1,t1);

数据文件可在此处下载:输入输出


回答:

我认为我已经找到了问题所在。这是一个数据集本身的问题(并非所有数据集都会出现这个问题)以及我缩放数据的方式。我最初的缩放方法,将结果处理在0到1之间,并没有帮助解决问题,反而导致了所见的糟糕结果:

# 将输入数据标准化到0到1之间X -= X.min()X /= X.max()

我找到了另一种缩放方法,由 sklearn 预处理包提供:

from sklearn import preprocessingX = preprocessing.scale(X)

这种缩放方法不在0到1之间,我需要进一步调查为什么它有如此大的帮助,但现在结果的准确率在96%到100%之间。与 MATLAB 的结果非常接近,我认为 MATLAB 使用了类似的(或相同的)预处理缩放方法。

正如我上面所说,这并非所有数据集的情况。使用 sklearn 内置的 iris 或 digit 数据集似乎在没有缩放的情况下也能产生良好的结果。

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注