蒙特卡洛树搜索：井字游戏的实现

更新：如果您想看看能否让AI表现得更好，我已经上传了完整的源代码：https://www.dropbox.com/s/ous72hidygbnqv6/MCTS_TTT.rar

更新：搜索空间已被搜索，并找到了导致失败的移动。但由于UCT算法，这些导致失败的移动并不经常被访问到。

为了学习MCTS（蒙特卡洛树搜索），我使用该算法为经典的井字游戏创建了一个AI。我使用以下设计实现了该算法：

MCTS阶段树策略基于UCT，默认策略是在游戏结束前进行随机移动。我观察到，在我的实现中，计算机有时会做出错误的移动，因为它未能“看到”某个特定移动会直接导致失败。

例如：井字游戏示例请注意，动作6（红色方块）的价值略高于蓝色方块，因此计算机标记了这个位置。我认为这是因为游戏策略基于随机移动，因此人类玩家很有可能不会在蓝色方块中放置“2”。如果玩家没有在蓝色方块中放置2，计算机就保证能赢。

我的问题

1）这是MCTS的一个已知问题，还是实现失败的结果？

2）可能的解决方案是什么？我在考虑在选择阶段限制移动，但我不是很确定 🙂

核心MCTS的代码：

    //执行函数    public unsafe byte GetBestMove(Game game, int player, TreeView tv)    {        //设置根节点和初始变量        Node root = new Node(null, 0, Opponent(player));        int startPlayer = player;        helper.CopyBytes(root.state, game.board);        //四个阶段：下降、展开、更新和增长，迭代执行X次        //-----------------------------------------------------------------------------------------------------        for (int iteration = 0; iteration < 1000; iteration++)        {            Node current = Selection(root, game);            int value = Rollout(current, game, startPlayer);            Update(current, value);        }        //恢复游戏状态并返回价值最高的移动        helper.CopyBytes(game.board, root.state);        //绘制树        DrawTree(tv, root);        //return root.children.Aggregate((i1, i2) => i1.visits > i2.visits ? i1 : i2).action;        return BestChildUCB(root, 0).action;    }    //#1. 如果1：我们有更多有效的可行移动或2：它是终端节点，则选择一个节点     public Node Selection(Node current, Game game)    {        while (!game.IsTerminal(current.state))        {            List<byte> validMoves = game.GetValidMoves(current.state);            if (validMoves.Count > current.children.Count)                return Expand(current, game);            else                current = BestChildUCB(current, 1.44);        }        return current;    }    //#1. 辅助函数    public Node BestChildUCB(Node current, double C)    {        Node bestChild = null;        double best = double.NegativeInfinity;        foreach (Node child in current.children)        {            double UCB1 = ((double)child.value / (double)child.visits) + C * Math.Sqrt((2.0 * Math.Log((double)current.visits)) / (double)child.visits);            if (UCB1 > best)            {                bestChild = child;                best = UCB1;            }        }        return bestChild;    }    //#2. 通过创建一个新移动并返回节点来扩展节点    public Node Expand(Node current, Game game)    {        //将当前状态复制到游戏中        helper.CopyBytes(game.board, current.state);        List<byte> validMoves = game.GetValidMoves(current.state);        for (int i = 0; i < validMoves.Count; i++)        {            //我们已经评估了这个移动            if (current.children.Exists(a => a.action == validMoves[i]))                continue;            int playerActing = Opponent(current.PlayerTookAction);            Node node = new Node(current, validMoves[i], playerActing);            current.children.Add(node);            //在游戏中执行移动并保存到子节点            game.Mark(playerActing, validMoves[i]);            helper.CopyBytes(node.state, game.board);            //返回到之前的游戏状态            helper.CopyBytes(game.board, current.state);            return node;        }        throw new Exception("错误");    }    //#3. 展开。使用给定的策略模拟游戏并返回值    public int Rollout(Node current, Game game, int startPlayer)    {        Random r = new Random(1337);        helper.CopyBytes(game.board, current.state);        int player = Opponent(current.PlayerTookAction);        //执行策略直到找到赢家，对于添加的第一个（更改？）节点        while (game.GetWinner() == 0)        {            //随机            List<byte> moves = game.GetValidMoves();            byte move = moves[r.Next(0, moves.Count)];            game.Mark(player, move);            player = Opponent(player);        }        if (game.GetWinner() == startPlayer)            return 1;        return 0;    }    //#4. 更新    public unsafe void Update(Node current, int value)    {        do        {            current.visits++;            current.value += value;            current = current.parent;        }        while (current != null);    }

回答：

好的，我通过添加以下代码解决了这个问题：

        //如果这个移动是终端移动并且对手获胜，这意味着我们之前做了一个移动，对手总能找到一个移动来赢...这不好        if (game.GetWinner() == Opponent(startPlayer))        {            current.parent.value = int.MinValue;            return 0;        }

我认为问题在于搜索空间太小。这确保了即使选择阶段选择了一个实际上是终端的移动，这个移动也永远不会被选择，资源将用于探索其他移动 :).

现在AI对AI总是打成平局，作为人类玩家几乎不可能击败AI 🙂

学技术

蒙特卡洛树搜索：井字游戏的实现

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复