随机森林和randomForestSRC的深度和OOB误差

这是我在R语言中使用随机森林和rfsrc的代码；有没有办法像sklearn版本那样在我的R代码中包含n_estimators和max_depth？另外，我如何绘制类似于下图的OOB误差与树数量的图表？

set.seed(2234)tic("Time to train RFSRC fast")fast.o <- rfsrc.fast(Label ~ ., data = train[(1:50000),],forest=TRUE)toc()print(fast.o)#print(vimp(fast.o)$importance)set.seed(2367)tic("Time to test RFSRC fast ")#data(breast, package = "randomForestSRC")fast.pred <- predict(fast.o, test[(1:50000),])toc()print(fast.pred)set.seed(3)tic("RF model fitting without Parallelization")rf <-randomForest(Label~.,data=train[(1:50000),])toc()print(rf)plot(rf)varImp(rf,sort = T)varImpPlot(rf, sort=T, n.var= 10, main= "Variable Importance", pch=16)rf_pred <- predict(rf, newdata=test[(1:50000),])confMatrix <- confusionMatrix(rf_pred,test[(1:50000),]$Label)confMatrix

感谢您的时间。

回答：

您需要设置block.size=1，并且请注意抽样是无放回的，您可以查看rfsrc的说明文档：

与Breiman的随机森林不同，这里默认的动作是无放回抽样。因此，袋外（OOB）在技术上意味着样本外，但出于传统原因我们保留了OOB这个术语。

所以使用一个示例数据集，

library(mlbench)library(randomForestSRC)data(Sonar)set.seed(911)trn = sample(nrow(Sonar),150)rf <- rfsrc(Class ~ ., data = Sonar[trn,],ntree=500,block.size=1,importance=TRUE)pred <- predict(rf,Sonar[-trn,],block.size=1)plot(rf$err.rate[,1],type="l",col="steelblue",xlab="ntrees",ylab="err.rate",ylim=c(0,0.5))lines(pred$err.rate[,1],col="orange")legend("topright",fill=c("steelblue","orange"),c("test","OOB.train"))

在randomForest中：

library(randomForest)rf <- randomForest(Class ~ ., data = Sonar[trn,],ntree=500)pred <- predict(rf,Sonar[-trn,],predict.all=TRUE)

不太确定是否有更简单的方法获取每棵树的误差：

err_by_tree = sapply(1:ncol(pred$individual),function(i){apply(pred$individual[,1:i,drop=FALSE],1,function(i)with(rle(i),values[which.max(lengths)]))})err_by_tree = colMeans(err_by_tree!=Sonar$Class[-trn])

然后绘图：

plot(rf$err.rate[,1],type="l",col="steelblue",xlab="ntrees",ylab="err.rate",    ylim=c(0,0.5))    lines(err_by_tree,col="orange")    legend("topright",fill=c("steelblue","orange"),c("test","OOB.train"))

学技术

随机森林和randomForestSRC的深度和OOB误差

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复