如何正确实现DQN算法

我正在尝试实现DeepMind在这篇论文中介绍的深度Q学习算法:

https://arxiv.org/pdf/1312.5602.pdf

我用它来训练一个学习玩乒乓球的智能体,但似乎没有效果(即使训练了2小时也没有看到任何改进)。这是我的代码:

import gymimport universeimport numpy as npimport kerasfrom keras.models import Sequentialfrom keras.layers import Dense, Conv2D, MaxPooling2D, Flatten, Activationfrom keras.models import load_modelimport randomenv = gym.make('gym-core.Pong-v0')env.configure(remotes=1)def num2str(number, obs):    number = np.argmax(number)    if number == 0:        action = [[('KeyEvent', 'ArrowRight', False), ('KeyEvent', 'ArrowLeft', True)] for ob in obs]    elif number == 1:        action = [[('KeyEvent', 'ArrowLeft', False), ('KeyEvent', 'ArrowRight', True)] for ob in obs]    return actiondef preprocess(original_obs):    obs = original_obs    obs = np.array(obs)[0]['vision']    obs = np.delete(obs, np.s_[195:769], axis=0)    obs = np.delete(obs, np.s_[0:35], axis=0)    obs = np.delete(obs, np.s_[160:1025], axis=1)    obs = np.mean(obs, axis=2)    obs = obs[::2,::2]    obs = np.reshape(obs, (80, 80, 1))    return obsmodel = Sequential()model.add(Conv2D(32, kernel_size = (8, 8), strides = (4, 4), border_mode='same', activation='relu', init='uniform', input_shape = (80, 80, 4)))model.add(MaxPooling2D(pool_size = (2, 2)))model.add(Conv2D(64, kernel_size = (2, 2), strides = (2, 2)))model.add(Conv2D(64, kernel_size = (3, 3), strides = (1, 1)))model.add(Flatten())model.add(Dense(256, init='uniform', activation='relu'))model.add(Dense(2, init='uniform', activation='linear'))model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])init_observe_time = 500D = []e = 1.0e_threshold = 0.05e_decay = 0.01gamma = 0.99batch_size = 15frequency = 10Q_values = np.array([0, 0])obs = env.reset()while True:    obs = env.step(num2str(np.array([random.randint(0, 1) for i in range(0, 2)]), obs))[0]    if obs != [None]:        breakx_t1 = preprocess(obs)s_t1 = np.stack((x_t1, x_t1, x_t1, x_t1), axis = 2)s_t1 = np.reshape(s_t1, (80, 80, 4))t = 0while True:    print("Time since last start: ", t)    a_t = np.zeros(2)    if random.random() < e:        a_index = random.randint(0, 1)        a_t[a_index] = 1    else:        Q_values = model.predict(np.array([s_t1]))[0]        a_index = np.argmax(Q_values)        a_t[a_index] = 1    print("Q Values: ", Q_values)    print("action taken: ", np.argmax(a_t))    print("epsilon: ", e)    if e > e_threshold:        e -= e_decay    obs, r_t, done, info = env.step(num2str(a_t, obs))    if obs == [None]:        continue    x_t2 = preprocess(obs)    print(x_t2.shape, s_t1[:,:,0:3].shape)    s_t2 = np.append(x_t2, s_t1[:,:,0:3], axis = 2)    D.append((s_t1, a_t, r_t, s_t2, done))    if t > init_observe_time and t%frequency == 0:        minibatch = random.sample(D, batch_size)        s1_batch = [i[0] for i in minibatch]        a_batch = [i[1] for i in minibatch]        r_batch = [i[2] for i in minibatch]        s2_batch = [i[3] for i in minibatch]        q_batch = model.predict(np.array(s2_batch))        y_batch = np.zeros((batch_size, 2))        y_batch = model.predict(np.array(s1_batch))        print("Q batch: ",  q_batch)        print("y batch: ",  y_batch)        for i in range(0, batch_size):            if (minibatch[i][4]):                y_batch[i][np.argmax(a_batch[i])] = r_batch[i][0]            else:                y_batch[i][np.argmax(a_batch[i])] = r_batch[i][0] + gamma * np.max(q_batch[i])        model.train_on_batch(np.array(s1_batch), y_batch)    s_t1 = s_t2    t += 1    env.render()

有没有人能提供一些建议,让它正常工作?


回答:

  • 你的第二个和第三个Conv2D层似乎缺少了relu激活函数。
  • 你的epsilon(或e)衰减得太快了。仅在95个时间步后,它就会降到0.05。我没能快速找到2013年论文中他们是如何做的,但在2015年的论文中,他们将epsilon1衰减到0.1,用了100万帧的时间。

这些是我立刻注意到的问题。我建议你从修复这些问题开始。

Related Posts

Keras Dense层输入未被展平

这是我的测试代码: from keras import…

无法将分类变量输入随机森林

我有10个分类变量和3个数值变量。我在分割后直接将它们…

如何在Keras中对每个输出应用Sigmoid函数?

这是我代码的一部分。 model = Sequenti…

如何选择类概率的最佳阈值?

我的神经网络输出是一个用于多标签分类的预测类概率表: …

在Keras中使用深度学习得到不同的结果

我按照一个教程使用Keras中的深度神经网络进行文本分…

‘MatMul’操作的输入’b’类型为float32,与参数’a’的类型float64不匹配

我写了一个简单的TensorFlow代码,但不断遇到T…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注