使用Unity ML Agents Python API时出现奇怪的结果

我在使用3DBall示例环境时，遇到了一些我无法理解的奇怪结果。我的代码目前只是一个for循环，用于查看奖励并用随机值填充所需的输入。然而，在执行过程中，从未显示过负奖励，并且有时会随机没有决策步骤，这似乎是合理的，但难道不应该继续模拟直到有决策步骤吗？除了文档外，几乎没有其他资源可以参考，任何帮助都将不胜感激。

env = UnityEnvironment()env.reset()behavior_names = env.behavior_specsfor i in range(50):    arr = []    behavior_names = env.behavior_specs    for i in behavior_names:        print(i)    DecisionSteps = env.get_steps("3DBall?team=0")    print(DecisionSteps[0].reward,len(DecisionSteps[0].reward))    print(DecisionSteps[0].action_mask) #for some reason it returns action mask as false when Decisionsteps[0].reward is empty and is None when not    for i in range(len(DecisionSteps[0])):        arr.append([])        for b in range(2):            arr[-1].append(random.uniform(-10,10))    if(len(DecisionSteps[0])!= 0):        env.set_actions("3DBall?team=0",numpy.array(arr))        env.step()    else:        env.step()env.close()

回答：

我认为你的问题在于，当模拟终止并需要重置时，代理不会返回decision_step，而是返回terminal_step。这是因为代理已经丢球，terminal_step中返回的奖励将是-1.0。我对你的代码进行了一些修改，现在它运行得很好（不过你可能需要更改设置，以便在某个代理丢球时不每次都重置）。

import numpy as npimport mlagentsfrom mlagents_envs.environment import UnityEnvironment# -----------------# This code is used to close an env that might not have been closed beforetry:    unity_env.close()except:    pass# -----------------env = UnityEnvironment(file_name = None)env.reset()for i in range(1000):    arr = []    behavior_names = env.behavior_specs    # Go through all existing behaviors    for behavior_name in behavior_names:        decision_steps, terminal_steps = env.get_steps(behavior_name)        for agent_id_terminated in terminal_steps:            print("Agent " + behavior_name + " has terminated, resetting environment.")            # This is probably not the desired behaviour, as the other agents are still active.             env.reset()        actions = []        for agent_id_decisions in decision_steps:            actions.append(np.random.uniform(-1,1,2))        # print(decision_steps[0].reward)        # print(decision_steps[0].action_mask)        if len(actions) > 0:            env.set_actions(behavior_name, np.array(actions))    try:        env.step()    except:        print("Something happend when taking a step in the environment.")        print("The communicatior has probably terminated, stopping simulation early.")        breakenv.close()

学技术

使用Unity ML Agents Python API时出现奇怪的结果

发表回复取消回复

相关文章：

使用 LibSVM 计算与均值/标准差对最接近的匹配项

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复