我在二分类问题上没有得到期望的输出。
这个问题是使用二分类来标记乳腺癌为:- 良性,或 – 恶性
它没有给出期望的输出。
首先有一个函数来加载数据集,它返回测试和训练数据的形状如下:
x_train 的形状为:(30, 381),y_train 的形状为:(1, 381),x_test 的形状为:(30, 188),y_test 的形状为:(1, 188).
然后有一个逻辑回归分类器的类,用于预测输出。
from sklearn.datasets import load_breast_cancerfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import accuracy_scoreimport numpy as npdef load_dataset(): cancer_data = load_breast_cancer() x_train, x_test, y_train, y_test = train_test_split(cancer_data.data, cancer_data.target, test_size=0.33) x_train = x_train.T x_test = x_test.T y_train = y_train.reshape(1, (len(y_train))) y_test = y_test.reshape(1, (len(y_test))) m = x_train.shape[1] return x_train, x_test, y_train, y_test, mclass Neural_Network(): def __init__(self): np.random.seed(1) self.weights = np.random.rand(30, 1) * 0.01 self.bias = np.zeros(shape=(1, 1)) def sigmoid(self, x): return 1 / (1 + np.exp(-x)) def train(self, x_train, y_train, iterations, m, learning_rate=0.5): for i in range(iterations): z = np.dot(self.weights.T, x_train) + self.bias a = self.sigmoid(z) cost = (-1 / m) * np.sum(y_train * np.log(a) + (1 - y_train) * np.log(1 - a)) if (i % 500 == 0): print("Cost after iteration %i: %f" % (i, cost)) dw = (1 / m) * np.dot(x_train, (a - y_train).T) db = (1 / m) * np.sum(a - y_train) self.weights = self.weights - learning_rate * dw self.bias = self.bias - learning_rate * db def predict(self, inputs): m = inputs.shape[1] y_predicted = np.zeros((1, m)) z = np.dot(self.weights.T, inputs) + self.bias a = self.sigmoid(z) for i in range(a.shape[1]): y_predicted[0, i] = 1 if a[0, i] > 0.5 else 0 return y_predictedif __name__ == "__main__": ''' step-1 : 加载数据集 x_train 的形状为:(30, 381) y_train 的形状为:(1, 381) x_test 的形状为:(30, 188) y_test 的形状为:(1, 188) ''' x_train, x_test, y_train, y_test, m = load_dataset() neuralNet = Neural_Network() ''' step-2 : 训练网络 ''' neuralNet.train(x_train, y_train,10000,m) y_predicted = neuralNet.predict(x_test) print("测试数据上的准确率: ") print(accuracy_score(y_test, y_predicted)*100)
程序给出的输出如下:
C:\Python36\python.exe C:/Users/LENOVO/PycharmProjects/MarkDmo001/Numpy.pyCost after iteration 0: 5.263853C:/Users/LENOVO/PycharmProjects/MarkDmo001/logisticReg.py:25: RuntimeWarning: overflow encountered in exp return 1 / (1 + np.exp(-x))C:/Users/LENOVO/PycharmProjects/MarkDmo001/logisticReg.py:33: RuntimeWarning: divide by zero encountered in log cost = (-1 / m) * np.sum(y_train * np.log(a) + (1 - y_train) * np.log(1 - a))C:/Users/LENOVO/PycharmProjects/MarkDmo001/logisticReg.py:33: RuntimeWarning: invalid value encountered in multiply cost = (-1 / m) * np.sum(y_train * np.log(a) + (1 - y_train) * np.log(1 - a))Cost after iteration 500: nanCost after iteration 1000: nanCost after iteration 1500: nanCost after iteration 2000: nanCost after iteration 2500: nanCost after iteration 3000: nanCost after iteration 3500: nanCost after iteration 4000: nanCost after iteration 4500: nanCost after iteration 5000: nanCost after iteration 5500: nanCost after iteration 6000: nanCost after iteration 6500: nanCost after iteration 7000: nanCost after iteration 7500: nanCost after iteration 8000: nanCost after iteration 8500: nanCost after iteration 9000: nanCost after iteration 9500: nanAccuracy: 0.0
回答:
问题是梯度爆炸。你需要将输入标准化到[0, 1]
范围内。
如果你查看训练数据中的特征3和特征23,你会发现这些值大于3000。这些值与初始权重相乘后,仍然在[0, 30]
范围内。因此,在第一次迭代中,z
向量只包含正数,值大约为50。因此,a
向量(你的sigmoid函数的输出)看起来像这样:
[0.9994797 0.99853904 0.99358676 0.99999973 0.98392862 0.99983016 0.99818802 ...]
所以在第一步中,你的模型总是以高置信度预测1。但这并不总是正确的,你模型输出的高概率导致了大的梯度,你可以在查看dw
的最高值时看到这一点。在我的例子中,
dw[3]
是388dw[23]
是571
其他值在[0, 55]
范围内。因此,你可以清楚地看到这些特征中的大输入如何导致梯度爆炸。因为梯度下降现在朝相反方向迈出了过大的步伐,下一阶段的权重不在[0, 0.01]
范围内,而是在[-285, 0.002]
范围内,这只会使情况变得更糟。在下一轮迭代中,z
包含大约-100万的值,这导致了sigmoid函数中的溢出。
解决方案
- 将输入标准化到
[0, 1]
范围内 - 使用
[-0.01, 0.01]
范围内的权重,这样它们大致可以相互抵消。否则,你在z
中的值仍然会随着你拥有的特征数量线性增加。
至于标准化输入,你可以使用sklearn的MinMaxScaler
:
x_train, x_test, y_train, y_test, m = load_dataset()scaler = MinMaxScaler()x_train_normalized = scaler.fit_transform(x_train.T).TneuralNet = Neural_Network()''' step-2 : 训练网络'''neuralNet.train(x_train_normalized, y_train,10000,m)# 对测试输入使用与训练输入相同的转换x_test_normalized = scaler.transform(x_test.T).Ty_predicted = neuralNet.predict(x_test_normalized)
.T
是因为sklearn期望训练输入的形状为(num_samples, num_features)
,而你的x_train
和x_test
的形状为(num_features, num_samples)
。