使用mlr包可以生成用于比较多个数据集上的分类器的关键差异(CD)图(Demšar2006),如下所示:
# THIS WORKSlibrary(mlr)lrns = list(makeLearner("classif.knn"), makeLearner("classif.svm"))tasks = list(iris.task, sonar.task)rdesc = makeResampleDesc("CV", iters = 2L)meas = list(acc)bmr = benchmark(lrns, tasks, rdesc, measures = meas)cd = generateCritDifferencesData(bmr)plotCritDifferences(cd)
这需要评估结果存储在一个相当复杂的BenchmarkResult
对象中,尽管数据基本上是一个矩阵(其中M[i, j]
保存分类器i
在数据集j
上的得分)。我之前在Python工作流程中生成过这样的数据,并将其导入到R
中的data.frame
中(因为似乎没有用于此类图的Python包)。
我如何从这些数据生成CD图?
我想过从data.frame
创建一个BenchmarkResult
,但不知道从哪里开始:
# THIS DOES NOT WORKlibrary(mlr)# Here I would import results from my experiments instead of using random data# e.g. scores for 5 classifiers and 30 data sets, eachresults = data.frame(replicate(5, runif(30, 0, 1)))# This is the functionality I'm looking forbmr = benchmarkResultFromDataFrame(results)cd = generateCritDifferencesData(bmr)plotCritDifferences(cd)
回答:
我最终成功创建了图表。只需设置BenchmarkResult
的少数几个属性即可:
leaners
,每个分类器的id
和short.name
measures
results
,每个数据集/分类器组合的aggr
代码可能如下所示(5个数据集的较小示例):
library(mlr)# Here I would import results from my experiments instead of using random data# e.g. scores for 5 classifiers and 30 data sets, eachresults <- data.frame(replicate(5, runif(30, 0, 1)))clf <- c('clf1', 'clf2', 'clf3', 'clf4', 'clf5')clf.short.name <- c('c1', 'c2', 'c3', 'c4', 'c5')dataset <- c('dataset1', 'dataset2', 'dataset3', 'dataset4', 'dataset5')score <- list(acc)# Setting up the learners: id, short.namebmr <- list()for (i in 1:5){ bmr$learners[[clf[i]]]$id <- clf[i] bmr$learners[[clf[i]]]$short.name <- clf.short.name[i]}# Setting up the measuresbmr$measures <- list(acc)# Setting up the resultsfor (i in 1:5){ bmr$results$`dataset1`[[clf[i]]]$aggr <- list('acc.test.mean' = results[1, i])}for (i in 1:5){ bmr$results$`dataset2`[[clf[i]]]$aggr <- list('acc.test.mean' = results[2, i])}for (i in 1:5){ bmr$results$`dataset3`[[clf[i]]]$aggr <- list('acc.test.mean' = results[3, i])}for (i in 1:5){ bmr$results$`dataset4`[[clf[i]]]$aggr <- list('acc.test.mean' = results[4, i])}for (i in 1:5){ bmr$results$`dataset5`[[clf[i]]]$aggr <- list('acc.test.mean' = results[5, i])}# Set BenchmarkResult classclass(bmr) <- "BenchmarkResult"# Statistics and plotcd = generateCritDifferencesData(bmr)plotCritDifferences(cd)
如果有人能教我更好的R
代码来避免这些for
循环和代码重复,我将非常欢迎!