关于MiniMax算法的一个非常有趣的问题。这是什么原因导致这种行为的？

我已经实现了MiniMax算法（带有alpha-beta剪枝），然而它表现出一种有趣的方式。我的玩家会建立巨大的领先优势，但在需要做出最终的获胜动作时，它却不采取那个动作，而是继续拖延游戏。

这是我的minimax函数：

// Game states are represented by Node objects (holds the move and the board in that state)//ValueStep is just a pair holding the minimax value and a game move (step) private ValueStep minimax(Node gameState,int depth,int alpha,int beta) {  //Node.MAXDEPTH is a constant  if(depth == Node.MAXDEPTH || gameOver(gameState.board)) {      return new ValueStep(gameState.heuristicValue(),gameState.step);  }  //this method definately works. child nodes are created with a move and an   //updated board and MAX value  //which determines if they are the maximizing or minimizing players game states.  gameState.children = gameState.findPossibleStates();  if(state.MAX) { //maximizing player      ValueStep best = null;      for(Node child: gameState.children) {          ValueStep vs = new ValueStep(minimax(child,depth+1,alpha,beta).value,child.move);          //values updated here if needed          if(best==null || vs.value > best.value) best = vs;          if(vs.value > alpha) alpha = vs.value;          if(alpha >= beta) break;      }      return best;  } else { //minimizing player      ValueStep best = null;      for(Node child: gameState.children) {          ValueStep vs = new ValueStep(minimax(child,depth+1,alfa,beta).value,child.move);          if(best==null || vs.value < best.value) best = vs;          if(vs.value < beta) beta = vs.value;          if(alpha >= beta) break;      }      return best;  }

}

起初我认为问题出在我的评估函数上，但如果是的话，我没能找到它。在这个游戏中，两个玩家都有得分，我的函数只是简单地从得分差异中计算出启发式值。这是我的函数：

public int heuristicValue() {       //I calculate the score difference here in this state and save it in        //the variable scoreDiff. scoreDiff will be positive if I am winning        //here, negative if im loosing.        //"this" is a Node object here. If the game is over here, special        //heuristic values are returned, depending on who wins (or if its a         //draw)         if(gameOver(this.board)) {            if(scoreDiff>0) {                return Integer.MAX_VALUE;              } else if(scoreDiff==0) {                return 0;            } else {                return Integer.MIN_VALUE;            }        }        int value = 0;        value += 100*scoreDiff; //caluclate the heuristic value using the score differenence. If its high, the value will be high as well       return value;  }

我已经将我的代码“翻译”成了英语，所以可能有拼写错误。我相当确定问题就在这儿的某个地方，但如果你需要其他代码，我会更新这个问题。再次强调，我的玩家可以建立优势，但不知为何它不会做出最终的获胜动作。感谢你的帮助！

回答：

假设你的Minimax玩家处于一个可以证明它能保证胜利的位置。通常会有许多不同的方式可以保证最终胜利。有些动作可能是即时获胜，有些动作可能会不必要地拖延游戏…只要不是一个非常愚蠢的动作，突然让对手赢（或平局），它们都是胜利，并且它们在游戏理论上具有相同的值（在你的代码中是Integer.MAX_VALUE）。

你的Minimax算法不会区分这些动作，简单地播放gameState.children列表中碰巧是第一个的动作。这可能是快速的、浅层的胜利，也可能是缓慢的、非常深的胜利。

有两种简单的方法可以让你的Minimax算法优先考虑快速胜利而不是慢速胜利：

最好的选择（因为它还有许多其他优势）是使用迭代加深。你可以查找详细信息，但基本思想是，你首先进行深度限制为1的Minimax搜索，然后进行深度限制为2的搜索，然后是深度限制为3的搜索，等等。一旦你的搜索证明了胜利，你就可以终止搜索并播放那个获胜的动作。这将使你的算法总是优先选择最短的胜利（因为那些是首先找到的）。
或者，你可以修改你的heuristicValue()函数以包含搜索深度。例如，你可以在获胜位置返回Integer.MAX_VALUE - depth。这将使更快的胜利实际上具有稍微更大的评估值。

学技术

关于MiniMax算法的一个非常有趣的问题。这是什么原因导致这种行为的？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复