sbf()函数是否使用metric参数来优化模型？

将ROC作为metric参数值传递给caretSBF函数

我们的目标是在运行特征选择的Selection By Filtering sbf()函数时，使用ROC汇总指标进行模型选择。

我们使用了mlbench包中的BreastCancer数据集作为可复现的示例，来运行train()和sbf()函数，分别使用metric = "Accuracy"和metric = "ROC"参数。

我们希望确保sbf()函数能够像train()和rfe()函数那样使用metric参数来优化模型。为此，我们计划将train()函数与sbf()函数结合使用。caretSBF$fit函数会调用train()，而caretSBF则传递给sbfControl。

从输出结果来看，metric参数似乎只用于inner resampling，而不是sbf部分，即在输出的outer resampling部分，metric参数并未像train()和rfe()那样被应用。

由于我们使用了caretSBF，它使用了train()，因此metric参数的作用范围似乎仅限于train()，因此未传递给sbf。

我们希望能得到关于sbf()函数是否使用metric参数来优化模型，即用于outer resampling的澄清？

这是我们关于可复现示例的工作，显示train()函数使用metric参数，使用Accuracy和ROC，但对于sbf我们还不确定。

I. 数据部分

  ## 加载所需的包     library(mlbench)  library(caret)  ## 从*mlbench*包中加载`BreastCancer`数据集     data("BreastCancer")  ## 数据清理以处理缺失值  # 删除任何列中有NA值的行/观察值  BrC1 <- BreastCancer[complete.cases(BreastCancer),]   # 移除Class和Id列，仅保留数值预测变量  Num_Pred <- BrC1[,2:10]

II. 自定义汇总函数

定义fiveStats汇总函数

  fiveStats <- function(...) c(twoClassSummary(...),                         defaultSummary(...))

III. 训练部分

定义trControl

  trCtrl <- trainControl(method="repeatedcv", number=10,  repeats=1, classProbs = TRUE, summaryFunction = fiveStats)

TRAIN + METRIC = “Accuracy”

   set.seed(1)   TR_acc <- train(Num_Pred,BrC1$Class, method="rf",metric="Accuracy",   trControl = trCtrl,tuneGrid=expand.grid(.mtry=c(2,3,4,5)))   TR_acc   # 随机森林    #    # 683个样本   #   9个预测变量   #   2个类别: '良性', '恶性'    #    # 无预处理   # 重抽样: 交叉验证（10折，重复1次）    # 样本大小摘要: 615, 615, 614, 614, 614, 615, ...    # 跨调参参数的重抽样结果:   #    #   mtry  ROC        Sens       Spec       Accuracy   Kappa       #   2     0.9936532  0.9729798  0.9833333  0.9765772  0.9490311   #   3     0.9936544  0.9729293  0.9791667  0.9750853  0.9457534   #   4     0.9929957  0.9684343  0.9750000  0.9706948  0.9361373   #   5     0.9922907  0.9684343  0.9666667  0.9677536  0.9295782   #    # 使用最大值选择最优模型时使用了准确率。   # 模型最终使用的值为mtry = 2.

TRAIN + METRIC = “ROC”

   set.seed(1)   TR_roc <- train(Num_Pred,BrC1$Class, method="rf",metric="ROC",   trControl = trCtrl,tuneGrid=expand.grid(.mtry=c(2,3,4,5)))   TR_roc   # 随机森林    #    # 683个样本   #   9个预测变量   #   2个类别: '良性', '恶性'    #    # 无预处理   # 重抽样: 交叉验证（10折，重复1次）    # 样本大小摘要: 615, 615, 614, 614, 614, 615, ...    # 跨调参参数的重抽样结果:   #    #   mtry  ROC        Sens       Spec       Accuracy   Kappa       #   2     0.9936532  0.9729798  0.9833333  0.9765772  0.9490311   #   3     0.9936544  0.9729293  0.9791667  0.9750853  0.9457534   #   4     0.9929957  0.9684343  0.9750000  0.9706948  0.9361373   #   5     0.9922907  0.9684343  0.9666667  0.9677536  0.9295782   #    # 使用最大值选择最优模型时使用了ROC。   # 模型最终使用的值为mtry = 3.

IV. 编辑caretSBF

编辑caretSBF汇总函数

   caretSBF$summary <- fiveStats

V. SBF部分

定义sbfControl

   sbfCtrl <- sbfControl(functions=caretSBF,    method="repeatedcv", number=10, repeats=1,   verbose=T, saveDetails = T)

SBF + METRIC = “Accuracy”

   set.seed(1)   sbf_acc <- sbf(Num_Pred, BrC1$Class,   sbfControl = sbfCtrl,   trControl = trCtrl, method="rf", metric="Accuracy")   ## sbf_acc     sbf_acc   # 通过过滤进行选择   #    # 外部重抽样方法: 交叉验证（10折，重复1次）    #    # 重抽样性能:   #    #     ROC  Sens   Spec Accuracy Kappa    ROCSD SensSD  SpecSD AccuracySD  KappaSD   #  0.9931 0.973 0.9833   0.9766 0.949 0.006272 0.0231 0.02913    0.01226 0.02646   #    # 使用训练集时，选择了9个变量:   #    Cl.thickness, Cell.size, Cell.shape, Marg.adhesion, Epith.c.size...   #    # 在重抽样过程中，选择的前5个变量（可能的9个变量中）:   #    Bare.nuclei (100%), Bl.cromatin (100%), Cell.shape (100%), Cell.size (100%), Cl.thickness (100%)   #    # 平均选择了9个变量（最小值 = 9，最大值 = 9）   ## sbf_acc的类别   class(sbf_acc)   # [1] "sbf"   ## sbf_acc元素的名称   names(sbf_acc)   #  [1] "pred"         "variables"    "results"      "fit"          "optVariables"   #  [6] "call"         "control"      "resample"     "metrics"      "times"          # [11] "resampledCM"  "obsLevels"    "dots"           ## sbf_acc的fit元素*     sbf_acc$fit   # 随机森林    #    # 683个样本   #   9个预测变量   #   2个类别: '良性', '恶性'    #    # 无预处理   # 重抽样: 交叉验证（10折，重复1次）    # 样本大小摘要: 615, 614, 614, 615, 615, 615, ...    # 跨调参参数的重抽样结果:   #    #   mtry  ROC        Sens       Spec       Accuracy   Kappa       #   2     0.9933176  0.9706566  0.9833333  0.9751492  0.9460717   #   5     0.9920034  0.9662121  0.9791667  0.9707801  0.9363708   #   9     0.9914825  0.9684343  0.9708333  0.9693308  0.9327662   #    # 使用最大值选择最优模型时使用了准确率。   # 模型最终使用的值为mtry = 2.    ## sbf_acc fit元素的名称     names(sbf_acc$fit)   #  [1] "method"       "modelInfo"    "modelType"    "results"      "pred"           #  [6] "bestTune"     "call"         "dots"         "metric"       "control"        # [11] "finalModel"   "preProcess"   "trainingData" "resample"     "resampledCM"    # [16] "perfNames"    "maximize"     "yLimits"      "times"        "levels"         ## sbf_acc fit的最终模型   sbf_acc$fit$finalModel   # 调用:   #  randomForest(x = x, y = y, mtry = param$mtry)    #                随机森林类型: 分类   #                      树的数量: 500   # 在每次分割时尝试的变量数量: 2   #    #         OOB估计的错误率: 2.34%   # 混淆矩阵:   #           良性 恶性 分类错误率   # 良性       431    13  0.02927928   # 恶性         3   236  0.01255230   ## sbf_acc的metric   sbf_acc$fit$metric   # [1] "Accuracy"   ## sbf_acc fit的最佳调参*     sbf_acc$fit$bestTune   #   mtry   # 1    2

回答：

学技术

sbf()函数是否使用metric参数来优化模型？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复