我实现了一个奥赛罗(黑白棋)游戏的Alpha-Beta剪枝迷你最大算法。然而,这个算法似乎总是返回我设置的默认动作(0, 0),以及次要值-32768,这意味着它在MAX子程序中被剪枝了。请问有什么建议可以改进这个算法,以及如何解决这个问题?
注意:我已经确认大部分情况下后继状态返回是正确的。目前的最大深度为8。电脑玩家的玩家编号(pn)是1,人类玩家的编号是0。第一阶段,0,是MINIMAX_MAX。Alpha和Beta最初分别设置为INT_MIN和INT_MAX。
mm_out minimax(Grid& G, int alpha, int beta, Action& A, uint pn, uint depth, bool stage) { if (G.check_terminal_state() || depth == MAX_DEPTH) {#ifdef DEBUG cout << "best action: (" << A.get_x() << ", " << A.get_y() << ")\n";#endif return mm_out(A, G.get_utility(pn)); } // 在这里添加终局得分总和#ifdef DEBUG if (stage == MINIMAX_MAX) { cout << "max " << alpha << " " << beta << "\n"; } else { cout << "min " << alpha << " " << beta << "\n"; }#endif set<Action> succ_temp = G.get_successors(pn); for (Action a : succ_temp) {#ifdef DEBUG cout << a.get_x() << " " << a.get_y() << '\n';#endif Grid gt(G); a.evaluate(gt); } set<Action, action_greater> successors(succ_temp.begin(), succ_temp.end());#ifdef DEBUG Player p(0, "minimaxtest"); G.display(p); int test; cin >> test;#endif // 如果没有后继状态,该玩家跳过 if (successors.size()) { for (auto a = successors.begin(); a != successors.end(); ++a) { Grid gt(G); gt.do_move(pn, a->get_x(), a->get_y(), !PRINT_ERR); Action at = *a; mm_out mt = minimax(gt, alpha, beta, at, pn ^ 1, depth + 1, !stage); int temp = mt.val;// A = mt.best_move; if (stage == MINIMAX_MAX) { if (alpha < temp) { alpha = temp; A = *a;#ifdef DEBUG cout << "Current action: (" << A.get_x() << ", " << A.get_y() << ") alpha = " << alpha << "\n";#endif } if (alpha >= beta) {#ifdef DEBUG cout << "pruned at max\n";#endif return mm_out(A, beta); } } else { if (beta > temp) { beta = temp; A = *a;#ifdef DEBUG cout << "Current action: (" << A.get_x() << ", " << A.get_y() << ") beta = " << beta << "\n";#endif } if (alpha >= beta) {#ifdef DEBUG cout << "pruned at min\n";#endif return mm_out(A, alpha); }} } return mm_out(A, (stage == MINIMAX_MAX) ? alpha : beta);}else { cout << "no successor\n"; return mm_out(A, (stage == MINIMAX_MAX) ? (std::numeric_limits<int>::max() - 1) : (std::numeric_limits<int>::min() + 1));}
}
效用函数:
int Grid::get_utility(uint pnum) const { if (pnum) return wcount - bcount; return bcount - wcount;}
回答:
你应该按值(而不是按引用)传递alpha
/beta
参数:
mm_out minimax(Grid& G, int alpha, int beta, Action& A, uint pn, uint depth, bool stage)
每个节点将其alpha和beta值传递给其子节点。子节点然后根据轮到谁来更新它们自己的alpha或beta值的副本,并返回该节点的最终评估。然后用它来更新父节点的alpha或beta值。