如何在R语言中为随机森林设置PPV？

我想创建一个优化PPV的模型。我已经创建了一个RF模型（如下所示），它输出了一个混淆矩阵，然后我手动计算了敏感性、特异性、PPV、NPV和F1。现在我知道准确性被优化了，但我愿意牺牲敏感性和特异性来获得更高的PPV。

data_ctrl_null <- trainControl(method="cv", number = 5, classProbs = TRUE, summaryFunction=twoClassSummary, savePredictions=T, sampling=NULL)set.seed(5368)model_htn_df <- train(outcome ~ ., data=htn_df, ntree = 1000, tuneGrid = data.frame(mtry = 38), trControl = data_ctrl_null, method= "rf",                            preProc=c("center","scale"),metric="ROC", importance=TRUE)model_htn_df$finalModel #provides confusion matrix

结果如下：

Call:  randomForest(x = x, y = y, ntree = 1000, mtry = param$mtry, importance = TRUE)            Type of random forest: classification                 Number of trees: 1000  No. of variables tried at each split: 38    OOB estimate of  error rate: 16.2%    Confusion matrix:      no yes class.error no  274  19  0.06484642 yes  45  57  0.44117647

我的手动计算结果：敏感性 = 55.9%，特异性 = 93.5%，PPV = 75.0%，NPV = 85.9%（混淆矩阵将我的“no”和“yes”作为结果进行了交换，所以我在计算性能指标时也交换了数字。）

那么，我该怎么做才能使PPV达到90%呢？

这是一个类似的问题，但我不是很理解。

回答：

我们定义一个函数来计算PPV并以名称返回结果：

PPV <- function (data,lev = NULL,model = NULL) {   value <- posPredValue(data$pred,data$obs, positive = lev[1])   c(PPV=value)}

假设我们有以下数据：

library(randomForest)library(caret)data=irisdata$Species = ifelse(data$Species == "versicolor","versi","others")trn = sample(nrow(iris),100)

然后我们通过指定PPV为度量指标来进行训练：

mdl <- train(Species ~ ., data = data[trn,],             method = "rf",             metric = "PPV",             trControl = trainControl(summaryFunction = PPV,                                       classProbs = TRUE))Random Forest 100 samples  4 predictor  2 classes: 'others', 'versi' No pre-processingResampling: Bootstrapped (25 reps) Summary of sample sizes: 100, 100, 100, 100, 100, 100, ... Resampling results across tuning parameters:  mtry  PPV        2     0.9682811  3     0.9681759  4     0.9648426PPV was used to select the optimal model using the largest value.The final value used for the model was mtry = 2.

现在你可以看到它是基于PPV进行训练的。然而，你不能强制训练达到0.9的PPV。这真的取决于数据，如果你的独立变量没有预测能力，无论你如何训练，它都不会改善，对吗？

学技术

如何在R语言中为随机森林设置PPV？

发表回复取消回复

相关文章：

Related Posts

如何对SVC进行超参数调优？

如何在初始训练后向模型添加训练数据？

使用Google Cloud Function并行运行带有不同用户参数的相同训练作业

加载Keras模型，TypeError: ‘module’ object is not callable

在计算KNN填补方法中特定列中NaN值的”距离平均值”时

使用巨大的S3 CSV文件或直接从预处理的关系型或NoSQL数据库获取数据的机器学习训练/测试工作

发表回复 取消回复

发表回复取消回复