这是我在R语言中使用随机森林和rfsrc的代码;有没有办法像sklearn版本那样在我的R代码中包含n_estimators和max_depth?另外,我如何绘制类似于下图的OOB误差与树数量的图表?
set.seed(2234)tic("Time to train RFSRC fast")fast.o <- rfsrc.fast(Label ~ ., data = train[(1:50000),],forest=TRUE)toc()print(fast.o)#print(vimp(fast.o)$importance)set.seed(2367)tic("Time to test RFSRC fast ")#data(breast, package = "randomForestSRC")fast.pred <- predict(fast.o, test[(1:50000),])toc()print(fast.pred)set.seed(3)tic("RF model fitting without Parallelization")rf <-randomForest(Label~.,data=train[(1:50000),])toc()print(rf)plot(rf)varImp(rf,sort = T)varImpPlot(rf, sort=T, n.var= 10, main= "Variable Importance", pch=16)rf_pred <- predict(rf, newdata=test[(1:50000),])confMatrix <- confusionMatrix(rf_pred,test[(1:50000),]$Label)confMatrix
感谢您的时间。
回答:
您需要设置block.size=1
,并且请注意抽样是无放回的,您可以查看rfsrc的说明文档:
与Breiman的随机森林不同,这里默认的动作是无放回抽样。因此,袋外(OOB)在技术上意味着样本外,但出于传统原因我们保留了OOB这个术语。
所以使用一个示例数据集,
library(mlbench)library(randomForestSRC)data(Sonar)set.seed(911)trn = sample(nrow(Sonar),150)rf <- rfsrc(Class ~ ., data = Sonar[trn,],ntree=500,block.size=1,importance=TRUE)pred <- predict(rf,Sonar[-trn,],block.size=1)plot(rf$err.rate[,1],type="l",col="steelblue",xlab="ntrees",ylab="err.rate",ylim=c(0,0.5))lines(pred$err.rate[,1],col="orange")legend("topright",fill=c("steelblue","orange"),c("test","OOB.train"))
在randomForest中:
library(randomForest)rf <- randomForest(Class ~ ., data = Sonar[trn,],ntree=500)pred <- predict(rf,Sonar[-trn,],predict.all=TRUE)
不太确定是否有更简单的方法获取每棵树的误差:
err_by_tree = sapply(1:ncol(pred$individual),function(i){apply(pred$individual[,1:i,drop=FALSE],1,function(i)with(rle(i),values[which.max(lengths)]))})err_by_tree = colMeans(err_by_tree!=Sonar$Class[-trn])
然后绘图:
plot(rf$err.rate[,1],type="l",col="steelblue",xlab="ntrees",ylab="err.rate", ylim=c(0,0.5)) lines(err_by_tree,col="orange") legend("topright",fill=c("steelblue","orange"),c("test","OOB.train"))