简单老虎机代码选择相同选项(汤普森抽样)

尝试使用这个简单的老虎机代码。

import numpy as npslotConRates = [.02, .013, .013, .015, .018]# 尝试次数N = 10000# 老虎机数量d = len(slotConRates)# 如果赢了就在数组X中加1X = np.zeros((N,d))for i in range(N):  for j in range(d):    if np.random.rand() < slotConRates[j]:      X[i][j] = 1win_reward = np.zeros(d)loss_reward = np.zeros(d)# 通过贝塔分布选择最佳老虎机并更新其胜负记录for i in range(N):  selected = 0  MaxRandom = 0  for j in range(d):    randomBeta = np.random.beta(win_reward[j] + 1, loss_reward[j] + 1)    if randomBeta > MaxRandom:      MaxRandom = randomBeta      selected = j      if X[i][selected] == 1:        win_reward[selected] += 1      else:        loss_reward[selected] += 1# 显示被认为是最佳的老虎机nSelected = win_reward + loss_reward for i in range(d):    print('Machine number ' + str(i + 1) + ' was selected ' + str(nSelected[i]) + ' times')print('Conclusion: Best machine is machine number ' + str(np.argmax(nSelected) + 1))

然而,它总是选择第一台老虎机作为每次迭代的选择。

Machine number 1 was selected 10000.0 timesMachine number 2 was selected 1181.0 timesMachine number 3 was selected 1108.0 timesMachine number 4 was selected 640.0 timesMachine number 5 was selected 1314.0 timesConclusion: Best machine is machine number 1

我应该如何修复这个问题?第二个for循环是问题所在。你对为什么会发生这种情况有什么见解吗?


回答:

我不是很理解你在做什么,但我觉得这个小的修改(调整最后一个ifelse块的缩进)可能是你想要的:

...for i in range(N):    selected = 0    MaxRandom = 0    for j in range(d):        randomBeta = np.random.beta(win_reward[j] + 1, loss_reward[j] + 1)        if randomBeta > MaxRandom:            MaxRandom = randomBeta            selected = j    if X[i][selected] == 1:        win_reward[selected] += 1    else:        loss_reward[selected] += 1...

EDIT:调整后的典型结果:

win_reward = array([ 34.,   7.,   3.,  20., 124.])loss_reward = array([1733.,  656.,  431., 1271., 5721.])nSelected = array([1767.,  663.,  434., 1291., 5845.])Machine number 1 was selected 1767.0 timesMachine number 2 was selected 663.0 timesMachine number 3 was selected 434.0 timesMachine number 4 was selected 1291.0 timesMachine number 5 was selected 5845.0 timesConclusion: Best machine is machine number 5(Test sum over selections: 10000.0)

我使用的完整代码列表:

import numpy as npslotConRates = [.02, .013, .013, .015, .018]N = 10000d = len(slotConRates)X = np.zeros((N,d))for i in range(N):    for j in range(d):        if np.random.rand() < slotConRates[j]:            X[i][j] = 1win_reward = np.zeros(d)loss_reward = np.zeros(d)for i in range(N):    selected = 0    MaxRandom = 0    for j in range(d):        randomBeta = np.random.beta(win_reward[j] + 1, loss_reward[j] + 1)        if randomBeta > MaxRandom:            MaxRandom = randomBeta            selected = j    if X[i][selected] == 1:        win_reward[selected] += 1    else:        loss_reward[selected] += 1nSelected = win_reward + loss_reward print(f'{win_reward = }')print(f'{loss_reward = }')print(f'{nSelected = }')for i in range(d):    print(f'Machine number {i + 1} was selected {nSelected[i]} times')print(f'Conclusion: Best machine is machine number {np.argmax(nSelected) + 1}')print(f'(Test sum over selections: {nSelected.sum()})')

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注