在R中使用for循环搜索最佳随机森林参数

大家好,我正在尝试使用for循环搜索最佳参数。然而,结果让我很困惑。以下代码应该提供相同的结果,因为参数“mtry”相同。

       gender Partner   tenure Churn3521     Male      No 0.992313   Yes2525.1   Male      No 4.276666    No567      Male     Yes 2.708050    No8381   Female      No 4.202127   Yes6258   Female      No 0.000000   Yes6569     Male     Yes 2.079442    No27410  Female      No 1.550804   Yes6429   Female      No 1.791759   Yes412    Female     Yes 3.828641    No4655   Female     Yes 3.737670    No

RFModel = randomForest(Churn ~ .,                     data = ggg,                     ntree = 30,                     mtry = 2,                     importance = TRUE,                     replace = FALSE)print(RFModel$confusion)    No Yes class.errorNo   4   1         0.2Yes  1   4         0.2

for(i in c(2)){   RFModel = randomForest(Churn ~ .,                     data = Trainingds,                     ntree = 30,                     mtry = i,                     importance = TRUE,                     replace = FALSE)   print(RFModel$confusion)}     No Yes class.errorNo   3   2         0.4Yes  2   3         0.4

  1. 代码1和代码2应该提供相同的输出结果。

回答:

每次运行时,您会得到略有不同的结果,因为随机性是算法内置的。为了构建每一棵树,算法会重新抽样数据框,并随机选择mtry列从重新抽样的数据框中构建树。如果您希望使用相同参数(例如,mtry,ntree)构建的模型每次都提供相同的结果,您需要设置一个随机种子。

例如,让我们运行10次randomForest并检查每次运行的均方误差的平均值。请注意,每次的均方误差均值是不同的:

library(randomForest)replicate(10, mean(randomForest(mpg ~ ., data=mtcars)$mse))
[1] 5.998530 6.307782 5.791657 6.125588 5.868717 5.845616 5.427208 6.112762 5.777624 6.150021

如果您运行上述代码,您将得到另外10个与上述值不同的数值。

如果您希望能够重现使用相同参数(例如,mtryntree)运行的模型结果,您可以设置一个随机种子。例如:

set.seed(5)mean(randomForest(mpg ~ ., data=mtcars)$mse)
[1] 6.017737

如果您使用相同的种子值,您将得到相同的结果,否则将得到不同的结果。使用更大的ntree值将减少,但不会消除模型运行之间的变异性。

更新: 当我使用您提供的数据样本运行您的代码时,我并不总是得到每次相同的结果。即使使用replace=TRUE,这会导致数据框被无放回抽样,用于构建树的列每次也可能不同:

> randomForest(Churn ~ .,+              data = ggg,+              ntree = 30,+              mtry = 2,+              importance = TRUE,+              replace = FALSE)Call: randomForest(formula = Churn ~ ., data = ggg, ntree = 30, mtry = 2,      importance = TRUE, replace = FALSE)                Type of random forest: classification                     Number of trees: 30No. of variables tried at each split: 2        OOB estimate of  error rate: 30%Confusion matrix:    No Yes class.errorNo   3   2         0.4Yes  1   4         0.2> randomForest(Churn ~ .,+              data = ggg,+              ntree = 30,+              mtry = 2,+              importance = TRUE,+              replace = FALSE)Call: randomForest(formula = Churn ~ ., data = ggg, ntree = 30, mtry = 2,      importance = TRUE, replace = FALSE)                Type of random forest: classification                     Number of trees: 30No. of variables tried at each split: 2        OOB estimate of  error rate: 20%Confusion matrix:    No Yes class.errorNo   4   1         0.2Yes  1   4         0.2> randomForest(Churn ~ .,+              data = ggg,+              ntree = 30,+              mtry = 2,+              importance = TRUE,+              replace = FALSE)Call: randomForest(formula = Churn ~ ., data = ggg, ntree = 30, mtry = 2,      importance = TRUE, replace = FALSE)                Type of random forest: classification                     Number of trees: 30No. of variables tried at each split: 2        OOB estimate of  error rate: 30%Confusion matrix:    No Yes class.errorNo   3   2         0.4Yes  1   4         0.2

以下是使用内置的iris数据框获得的类似结果:

> randomForest(Species ~ ., data=iris, ntree=30, mtry=2, importance = TRUE,+              replace = FALSE)Call: randomForest(formula = Species ~ ., data = iris, ntree = 30,      mtry = 2, importance = TRUE, replace = FALSE)                Type of random forest: classification                     Number of trees: 30No. of variables tried at each split: 2        OOB estimate of  error rate: 3.33%Confusion matrix:           setosa versicolor virginica class.errorsetosa         50          0         0        0.00versicolor      0         47         3        0.06virginica       0          2        48        0.04> randomForest(Species ~ ., data=iris, ntree=30, mtry=2, importance = TRUE,+              replace = FALSE)Call: randomForest(formula = Species ~ ., data = iris, ntree = 30,      mtry = 2, importance = TRUE, replace = FALSE)                Type of random forest: classification                     Number of trees: 30No. of variables tried at each split: 2        OOB estimate of  error rate: 4.67%Confusion matrix:           setosa versicolor virginica class.errorsetosa         50          0         0        0.00versicolor      0         47         3        0.06virginica       0          4        46        0.08> randomForest(Species ~ ., data=iris, ntree=30, mtry=2, importance = TRUE,+              replace = FALSE)Call: randomForest(formula = Species ~ ., data = iris, ntree = 30,      mtry = 2, importance = TRUE, replace = FALSE)                Type of random forest: classification                     Number of trees: 30No. of variables tried at each split: 2        OOB estimate of  error rate: 6%Confusion matrix:           setosa versicolor virginica class.errorsetosa         50          0         0        0.00versicolor      0         47         3        0.06virginica       0          6        44        0.12

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注