蒙特卡洛树搜索不起作用

我目前正在为棋盘游戏Hex编写人工智能。我希望使用蒙特卡洛树搜索来实现这一点，并且已经尝试过实现它。然而，人工智能做出了令人难以置信的愚蠢（随机）移动，我无法弄清楚为什么它不起作用。

import java.util.ArrayList;import java.util.Random;/** * Created by Robin on 18.03.2017. */public class TreeNode {    private static final Random random = new Random();    private static final double epsion=10e-5;    protected double nvisits;    protected double totValue;    protected int move=-1;    private HexBoard board;    protected ArrayList<TreeNode>children ;    public TreeNode(HexBoard board){        this.board =board;    }    //Copy-Constructor    public TreeNode(TreeNode treeNode){        this.nvisits=treeNode.nvisits;        this.totValue=treeNode.totValue;        this.move=treeNode.move;        this.board = new HexBoard(treeNode.board);    }    public void update(double value){        totValue+=value*board.color;        nvisits++;    }    public void expand(){        assert(children==null);        children = new ArrayList<>(121-board.moveCount);        for(int i=0;i<121;i++){            if(board.board[i]!=HexBoard.EMPTY)                continue;                TreeNode newNode = new TreeNode(board);                newNode.move =i;                children.add(newNode);        }    }    public void calculateIteration(){        ArrayList<TreeNode>visited = new ArrayList<>();        TreeNode current =this;        visited.add(current);        while(!current.isLeafNode()){            current =current.select();            board.makeMove(current.move);            visited.add(current);        }        //Found a leaf node        double value;        if(current.board.getWinner()==0){            current.expand();            TreeNode newNode =current.select();            value =playOut(newNode.board);        }else{            value =current.board.getWinner();        }        //update all the nodes        for(int i=1;i<visited.size();i++){            visited.get(i).update(value);            board.undoMove(visited.get(i).move);        }        visited.get(0).update(value);    }    public static int playOut(HexBoard board){        int winner=0;        if(board.moveCount==121) {            winner=board.getWinner();            return winner;        }        //Checking-Movecount vs actual stones on the board        final double left =121-board.moveCount;        double probibility =1/left;        double summe =0;        double p =random.nextDouble();        int randomMove =0;        for(int i=0;i<121;i++){            if(board.board[i]!=HexBoard.EMPTY)                continue;            summe+=probibility;            if(p<=summe && probibility!=0) {                randomMove = i;                break;            }        }        board.makeMove(randomMove);        winner =playOut(board);        board.undoMove(randomMove);        return winner;    }    public TreeNode select(){        TreeNode bestNode=null;        double bestValue =-10000000;        for(TreeNode node : children){            double uctvalue =(node.nvisits==0)?100000:(node.totValue/(node.nvisits)+Math.sqrt((Math.log(this.nvisits))/(2*node.nvisits)));            uctvalue+=epsion*random.nextDouble();            if(uctvalue>bestValue){                bestValue=uctvalue;                bestNode =node;            }        }        return bestNode;        ///    }    public boolean isLeafNode(){        return (children==null);    }}

我在calculateIteration()方法中的实现是否正确？

我知道这可能不是一个很吸引人的问题，但我会很感激任何帮助。

回答：

原帖作者在评论中添加了额外信息。这些额外信息的重要部分是makeMove()方法被实现为检查下一个要玩的玩家（以确保对棋盘的更新是正确的）。

鉴于这些信息，原帖作者在select()方法中的实现是不正确的，因为它在计算UCT分数时没有考虑到下一个要移动的玩家。UCT分数由“利用”部分（第一个分数，计算所有先前模拟的平均分数）和“探索”部分（平方根下的部分，对于相对于其父节点访问较少的节点会增加）组成。当对手下一步可以移动时，这个方程的利用部分应该被否定。如果不这样做，人工智能将基本上假设对手愿意主动帮助人工智能，而不是假设对手会为自己争取胜利。

学技术

蒙特卡洛树搜索不起作用

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复