尝试使用Keras/Theano训练一个非常简单的CNN来解决二分类问题。损失函数总是收敛到约8.0151。修改参数和架构都没有帮助。因此,我创建了一个非常简单的例子:新的输入数组,一个全部是1,另一个全部是0。结果还是一样。我尝试了全1和全-1,情况相同。然后,尝试了全0和随机数,结果还是一样。降低了维度和深度,移除了dropout,调整了参数,结果依然相同。请帮帮我!这是怎么回事?
import numpyA = []B = []for j in range(100): npa = numpy.array([[1 for j in range(100)] for i in range(100)]) A.append(npa.reshape(1,npa.shape[0],npa.shape[1]))for j in range(100): npa = numpy.array([[0 for j in range(100)] for i in range(100)]) B.append(npa.reshape(1,npa.shape[0],npa.shape[1]))trainXA = []trainXB = []testXA = []testXB = []for j in range(len(A)): if ((j+2) % 7) != 0: trainXA.append(A[j]) trainXB.append(B[j]) else: testXA.append(A[j]) testXB.append(B[j])X_train = numpy.array(trainXA + trainXB)X_test = numpy.array(testXA + testXB)Y_train = numpy.array([[1,0] for i in range(len(X_train)/2)] + [[0,1] for i in range(len(X_train)/2)])import randomdef jumblelists(C,D): outC = [] outD = [] for j in range(len(C)): newpos = int(random.random()*(len(outC)+1)) outC = outC[:newpos]+[C[j]]+outC[newpos:] outD = outD[:newpos]+[D[j]]+outD[newpos:] return numpy.array(outC),numpy.array(outD)X_train,Y_train = jumblelists(X_train,Y_train)from keras.models import Sequentialfrom keras.layers import Dense, Dropout, Activation, Flattenfrom keras.layers import Convolution2D, MaxPooling2Dfrom keras.optimizers import SGDmodel = Sequential()model.add(Convolution2D(32, 3, 3, border_mode='valid', input_shape=(1,100,100)))model.add(Activation('relu'))model.add(Convolution2D(32, 3, 3))model.add(Activation('relu'))model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Flatten())model.add(Dense(128))model.add(Activation('relu'))model.add(Dense(2))model.add(Activation('softmax'))sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)model.compile(loss='binary_crossentropy', optimizer=sgd)model.fit(X_train, Y_train, batch_size=32, nb_epoch=10)
回答:
你的学习率设置得太高了,可能是导致权重和梯度爆炸的原因。只要将
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
改为
sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
你还可以尝试不同的优化器。使用默认设置的Adam通常是一个不错的选择。