自定义精确度-召回率AUC度量在mlr3

我想在mlr3中创建一个自定义的精确度-召回率AUC度量。

我感觉我几乎成功了，但R抛出了一个我不知道如何解释的烦人错误。

让我们定义这个度量：

PRAUC = R6::R6Class("PRAUC",  inherit = mlr3::MeasureClassif,    public = list(      initialize = function() {        super$initialize(          # 自定义度量的ID          id = "classif.prauc",          # 计算此度量所需的额外包          packages = c('PRROC'),          # 属性，请参见下文          properties = character(),          # 学习器所需的预测类型          predict_type = "prob",          # 值的可行范围          range = c(0, 1),          # 在调优过程中最小化？          minimize = FALSE        )      }    ),    private = list(      # 操作预测对象的自定义评分函数      .score = function(prediction, ...) {        truth1 <- ifelse(prediction$truth == levels(prediction$truth)[1], 1, 0) # 函数PRROC::pr.curve假设二元响应是数值，正类是1，负类是0        PRROC::pr.curve(scores.class0 = prediction$prob, weights.class0 = truth1)      }    ))mlr3::mlr_measures$add("classif.prauc", PRAUC)

让我们看看它是否工作：

task_sonar <- tsk('sonar')learner <- lrn('classif.rpart', predict_type = 'prob')learner$train(task_sonar)pred <- learner$predict(task_sonar)pred$score(msr('classif.prauc'))# Error in if (sum(weights < 0) != 0) { : #  缺少TRUE/FALSE所需的值

这是错误追踪：

11.check(length(sorted.scores.class0), weights.class0) 10.compute.pr(scores.class0, scores.class1, weights.class0, weights.class1,     curve, minStepSize, max.compute, min.compute, rand.compute,     dg.compute) 9.PRROC::pr.curve(scores.class0 = prediction$prob, weights.class0 = truth1) 8.measure$.__enclos_env__$private$.score(prediction = prediction,     task = task, learner = learner, train_set = train_set) 7.measure_score(self, prediction, task, learner, train_set) 6.m$score(prediction = self, task = task, learner = learner, train_set = train_set) 5.FUN(X[[i]], ...) 4.vapply(.x, .f, FUN.VALUE = .value, USE.NAMES = FALSE, ...) 3.map_mold(.x, .f, NA_real_, ...) 2.map_dbl(measures, function(m) m$score(prediction = self, task = task,     learner = learner, train_set = train_set)) 1.pred$score(msr("classif.prauc"))

看起来问题出自PRROC::pr.curve。然而，当在实际的预测对象pred上尝试此函数时，它运行得很好：

PRROC::pr.curve(  scores.class0 = pred$prob[, 1],   weights.class0 =  ifelse(pred$truth == levels(pred$truth)[1], 1, 0))#  精确度-召回率曲线##    曲线下面积（积分）：#     0.9081261##    曲线下面积（Davis & Goadrich）：#     0.9081837 ##    未计算曲线（可以通过使用curve=TRUE来计算）

错误可能发生的一个可能情况是，在PRAUC内部，PRROC::pr.curve的参数weights.class0是NA。我还没有确认这一点，但我怀疑weights.class0接收到的是NA而不是数值，导致PRROC::pr.curve在PRAUC内部出现故障。如果真是这样，我不知道为什么会发生这种情况。

可能还有其他我没有想到的情况。任何帮助将不胜感激。

编辑

误用的回答帮助我明白了为什么我的度量不起作用。首先，

PRROC::pr.curve(scores.class0 = prediction$prob, weights.class0 = truth1)

应该改为

PRROC::pr.curve(scores.class0 = prediction$prob[, 1], weights.class0 = truth1)。

其次，函数pr.curve返回一个类PRROC的对象，而我定义的mlr3度量实际上期望的是numeric。所以应该改为

PRROC::pr.curve(scores.class0 = prediction$prob[, 1], weights.class0 = truth1)[[2]]

或

PRROC::pr.curve(scores.class0 = prediction$prob[, 1], weights.class0 = truth1)[[3]]，

具体取决于计算AUC的方法（请参见?PRROC::pr.curve）。

请注意，虽然MLmetrics::PRAUC远比PRROC::pr.curve不那么令人困惑，但似乎前者实现得不好。

这是使用PRROC::pr.curve实际工作的度量实现：

PRAUC = R6::R6Class("PRAUC",  inherit = mlr3::MeasureClassif,    public = list(      initialize = function() {        super$initialize(          # 自定义度量的ID          id = "classif.prauc",          # 计算此度量所需的额外包          packages = c('PRROC'),          # 属性，请参见下文          properties = character(),          # 学习器所需的预测类型          predict_type = "prob",          # 值的可行范围          range = c(0, 1),          # 在调优过程中最小化？          minimize = FALSE        )      }    ),    private = list(      # 操作预测对象的自定义评分函数      .score = function(prediction, ...) {        truth1 <- ifelse(prediction$truth == levels(prediction$truth)[1], 1, 0) # 看起来在mlr3中，二元分类中的正类总是第一个因子级别        PRROC::pr.curve(          scores.class0 = prediction$prob[, 1], # 看起来在mlr3中，二元分类中的正类总是两列中的第一列          weights.class0 = truth1        )[[2]]      }    ))mlr3::mlr_measures$add("classif.prauc", PRAUC)

示例：

task_sonar <- tsk('sonar')learner <- lrn('classif.rpart', predict_type = 'prob')learner$train(task_sonar)pred <- learner$predict(task_sonar)pred$score(msr('classif.prauc'))#classif.prauc #     0.923816

然而，现在的问题是，更改正类会导致不同的分数：

task_sonar <- tsk('sonar')task_sonar$positive <- 'R' # 现在R是正类learner <- lrn('classif.rpart', predict_type = 'prob')learner$train(task_sonar)pred <- learner$predict(task_sonar)pred$score(msr('classif.prauc'))#classif.prauc #    0.9081261

回答：

学技术

自定义精确度-召回率AUC度量在mlr3

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复