我在使用3DBall示例环境时,遇到了一些我无法理解的奇怪结果。我的代码目前只是一个for循环,用于查看奖励并用随机值填充所需的输入。然而,在执行过程中,从未显示过负奖励,并且有时会随机没有决策步骤,这似乎是合理的,但难道不应该继续模拟直到有决策步骤吗?除了文档外,几乎没有其他资源可以参考,任何帮助都将不胜感激。
env = UnityEnvironment()env.reset()behavior_names = env.behavior_specsfor i in range(50): arr = [] behavior_names = env.behavior_specs for i in behavior_names: print(i) DecisionSteps = env.get_steps("3DBall?team=0") print(DecisionSteps[0].reward,len(DecisionSteps[0].reward)) print(DecisionSteps[0].action_mask) #for some reason it returns action mask as false when Decisionsteps[0].reward is empty and is None when not for i in range(len(DecisionSteps[0])): arr.append([]) for b in range(2): arr[-1].append(random.uniform(-10,10)) if(len(DecisionSteps[0])!= 0): env.set_actions("3DBall?team=0",numpy.array(arr)) env.step() else: env.step()env.close()
回答:
我认为你的问题在于,当模拟终止并需要重置时,代理不会返回decision_step
,而是返回terminal_step
。这是因为代理已经丢球,terminal_step中返回的奖励将是-1.0。我对你的代码进行了一些修改,现在它运行得很好(不过你可能需要更改设置,以便在某个代理丢球时不每次都重置)。
import numpy as npimport mlagentsfrom mlagents_envs.environment import UnityEnvironment# -----------------# This code is used to close an env that might not have been closed beforetry: unity_env.close()except: pass# -----------------env = UnityEnvironment(file_name = None)env.reset()for i in range(1000): arr = [] behavior_names = env.behavior_specs # Go through all existing behaviors for behavior_name in behavior_names: decision_steps, terminal_steps = env.get_steps(behavior_name) for agent_id_terminated in terminal_steps: print("Agent " + behavior_name + " has terminated, resetting environment.") # This is probably not the desired behaviour, as the other agents are still active. env.reset() actions = [] for agent_id_decisions in decision_steps: actions.append(np.random.uniform(-1,1,2)) # print(decision_steps[0].reward) # print(decision_steps[0].action_mask) if len(actions) > 0: env.set_actions(behavior_name, np.array(actions)) try: env.step() except: print("Something happend when taking a step in the environment.") print("The communicatior has probably terminated, stopping simulation early.") breakenv.close()