在R中使用for循环搜索最佳随机森林参数

大家好，我正在尝试使用for循环搜索最佳参数。然而，结果让我很困惑。以下代码应该提供相同的结果，因为参数“mtry”相同。

       gender Partner   tenure Churn3521     Male      No 0.992313   Yes2525.1   Male      No 4.276666    No567      Male     Yes 2.708050    No8381   Female      No 4.202127   Yes6258   Female      No 0.000000   Yes6569     Male     Yes 2.079442    No27410  Female      No 1.550804   Yes6429   Female      No 1.791759   Yes412    Female     Yes 3.828641    No4655   Female     Yes 3.737670    No

RFModel = randomForest(Churn ~ .,                     data = ggg,                     ntree = 30,                     mtry = 2,                     importance = TRUE,                     replace = FALSE)print(RFModel$confusion)    No Yes class.errorNo   4   1         0.2Yes  1   4         0.2

for(i in c(2)){   RFModel = randomForest(Churn ~ .,                     data = Trainingds,                     ntree = 30,                     mtry = i,                     importance = TRUE,                     replace = FALSE)   print(RFModel$confusion)}     No Yes class.errorNo   3   2         0.4Yes  2   3         0.4

代码1和代码2应该提供相同的输出结果。

回答：

每次运行时，您会得到略有不同的结果，因为随机性是算法内置的。为了构建每一棵树，算法会重新抽样数据框，并随机选择mtry列从重新抽样的数据框中构建树。如果您希望使用相同参数（例如，mtry，ntree）构建的模型每次都提供相同的结果，您需要设置一个随机种子。

例如，让我们运行10次randomForest并检查每次运行的均方误差的平均值。请注意，每次的均方误差均值是不同的：

library(randomForest)replicate(10, mean(randomForest(mpg ~ ., data=mtcars)$mse))

[1] 5.998530 6.307782 5.791657 6.125588 5.868717 5.845616 5.427208 6.112762 5.777624 6.150021

如果您运行上述代码，您将得到另外10个与上述值不同的数值。

如果您希望能够重现使用相同参数（例如，mtry和ntree）运行的模型结果，您可以设置一个随机种子。例如：

set.seed(5)mean(randomForest(mpg ~ ., data=mtcars)$mse)

[1] 6.017737

如果您使用相同的种子值，您将得到相同的结果，否则将得到不同的结果。使用更大的ntree值将减少，但不会消除模型运行之间的变异性。

更新： 当我使用您提供的数据样本运行您的代码时，我并不总是得到每次相同的结果。即使使用replace=TRUE，这会导致数据框被无放回抽样，用于构建树的列每次也可能不同：

> randomForest(Churn ~ .,+              data = ggg,+              ntree = 30,+              mtry = 2,+              importance = TRUE,+              replace = FALSE)Call: randomForest(formula = Churn ~ ., data = ggg, ntree = 30, mtry = 2,      importance = TRUE, replace = FALSE)                Type of random forest: classification                     Number of trees: 30No. of variables tried at each split: 2        OOB estimate of  error rate: 30%Confusion matrix:    No Yes class.errorNo   3   2         0.4Yes  1   4         0.2> randomForest(Churn ~ .,+              data = ggg,+              ntree = 30,+              mtry = 2,+              importance = TRUE,+              replace = FALSE)Call: randomForest(formula = Churn ~ ., data = ggg, ntree = 30, mtry = 2,      importance = TRUE, replace = FALSE)                Type of random forest: classification                     Number of trees: 30No. of variables tried at each split: 2        OOB estimate of  error rate: 20%Confusion matrix:    No Yes class.errorNo   4   1         0.2Yes  1   4         0.2> randomForest(Churn ~ .,+              data = ggg,+              ntree = 30,+              mtry = 2,+              importance = TRUE,+              replace = FALSE)Call: randomForest(formula = Churn ~ ., data = ggg, ntree = 30, mtry = 2,      importance = TRUE, replace = FALSE)                Type of random forest: classification                     Number of trees: 30No. of variables tried at each split: 2        OOB estimate of  error rate: 30%Confusion matrix:    No Yes class.errorNo   3   2         0.4Yes  1   4         0.2

以下是使用内置的iris数据框获得的类似结果：

> randomForest(Species ~ ., data=iris, ntree=30, mtry=2, importance = TRUE,+              replace = FALSE)Call: randomForest(formula = Species ~ ., data = iris, ntree = 30,      mtry = 2, importance = TRUE, replace = FALSE)                Type of random forest: classification                     Number of trees: 30No. of variables tried at each split: 2        OOB estimate of  error rate: 3.33%Confusion matrix:           setosa versicolor virginica class.errorsetosa         50          0         0        0.00versicolor      0         47         3        0.06virginica       0          2        48        0.04> randomForest(Species ~ ., data=iris, ntree=30, mtry=2, importance = TRUE,+              replace = FALSE)Call: randomForest(formula = Species ~ ., data = iris, ntree = 30,      mtry = 2, importance = TRUE, replace = FALSE)                Type of random forest: classification                     Number of trees: 30No. of variables tried at each split: 2        OOB estimate of  error rate: 4.67%Confusion matrix:           setosa versicolor virginica class.errorsetosa         50          0         0        0.00versicolor      0         47         3        0.06virginica       0          4        46        0.08> randomForest(Species ~ ., data=iris, ntree=30, mtry=2, importance = TRUE,+              replace = FALSE)Call: randomForest(formula = Species ~ ., data = iris, ntree = 30,      mtry = 2, importance = TRUE, replace = FALSE)                Type of random forest: classification                     Number of trees: 30No. of variables tried at each split: 2        OOB estimate of  error rate: 6%Confusion matrix:           setosa versicolor virginica class.errorsetosa         50          0         0        0.00versicolor      0         47         3        0.06virginica       0          6        44        0.12

学技术

在R中使用for循环搜索最佳随机森林参数

发表回复取消回复

相关文章：

Related Posts

Keras Dense层输入未被展平

无法将分类变量输入随机森林

如何在Keras中对每个输出应用Sigmoid函数？

如何选择类概率的最佳阈值？

在Keras中使用深度学习得到不同的结果

‘MatMul’操作的输入’b’类型为float32，与参数’a’的类型float64不匹配

发表回复 取消回复

发表回复取消回复