Home bandit

bandit

为什么强盗问题在强化学习中也被称为一步/状态的MDP？

xiaolong · 2025年5月26日 · 0 Comment

我们所说的1步/状态的MDP（马尔可夫决策过程）是什么…