我正在尝试从WEKA中获取精确的预测,并且需要增加其预测数据输出的小数位数。
我的.arff训练集看起来像这样:
@relation TrainSet@attribute TimeDiff1 numeric@attribute TimeDiff2 numeric@attribute TimeDiff3 numeric@attribute TimeDiff4 numeric@attribute TimeDiff5 numeric@attribute TimeDiff6 numeric@attribute TimeDiff7 numeric@attribute TimeDiff8 numeric@attribute TimeDiff9 numeric@attribute TimeDiff10 numeric@attribute LBN/Distance numeric@attribute LBNDiff1 numeric@attribute LBNDiff2 numeric@attribute LBNDiff3 numeric@attribute Size numeric@attribute RW {R,W}@attribute 'Response Time' numeric@data0,0,0,0,0,0,0,0,0,0,203468398592,0,0,0,32768,R,0.0064750.004254,0,0,0,0,0,0,0,0,0,4564742206976,4361273808384,0,0,65536,R,0.0110250.002128,0.006382,0,0,0,0,0,0,0,0,4585966117376,21223910400,4382497718784,0,4096,R,0.013890.001616,0.003744,0,0,0,0,0,0,0,0,4590576115200,4609997824,25833908224,4387107716608,4096,R,0.0052760.002515,0.004131,0.010513,0,0,0,0,0,0,0,233456156672,-4357119958528,-4352509960704,-4331286050304,32768,R,0.010090.004332,0.006847,0.010591,0,0,0,0,0,0,0,312887472128,79431315456,-4277688643072,-4273078645248,4096,R,0.0050810.000342,0.004674,0.008805,0,0,0,0,0,0,0,3773914294272,3461026822144,3540458137600,-816661820928,8704,R,0.0042520.000021,0.000363,0.00721,0,0,0,0,0,0,0,3772221901312,-1692392960,3459334429184,3538765744640,4096,W,0.000170.000042,0.000063,0.004737,0.01525,0,0,0,0,0,0,3832104423424,59882522112,58190129152,3519216951296,16384,W,0.0001670.005648,0.00569,0.006053,0.016644,0,0,0,0,0,0,312887476224,-3519216947200,-3459334425088,-3461026818048,19456,R,0.009504
我正在尝试获取“响应时间”的预测,这是最右边的一列。如您所见,我的数据保留到小数点后第6位。
然而,WEKA的预测只保留到小数点后第3位。以下是名为“predictions”的文件的结果:
inst# actual predicted error 1 0.006 0.005 -0.002 2 0.011 0.017 0.006 3 0.014 0.002 -0.012 4 0.005 0.022 0.016 5 0.01 0.012 0.002 6 0.005 0.012 0.007 7 0.004 0.018 0.014 8 0 0.001 0 9 0 0.001 0 10 0.01 0.012 0.003
如您所见,这大大限制了我的预测精度。对于非常小的数值(如第8行和第9行)小于0.0005,它们会显示为0,而不是更精确的小数。
我在使用WEKA时选择了“简单命令行”而不是GUI。我构建模型的命令如下所示:
java weka.classifiers.trees.REPTree -M 2 -V 0.00001 -N 3 -S 1 -L -1 -I 0.0 -num-decimal-places 6 \ -t [removed path]/TrainSet.arff \ -T [removed path]/TestSet.arff \ -d [removed path]/model1.model > \ [removed path]/model1output
([removed path]: 我只是为了隐私删除了完整的路径名)
如您所见,我找到了这个“-num-decimal-places”开关来创建模型。
然后我使用以下命令进行预测:
java weka.classifiers.trees.REPTree \ -T [removed path]/LUN0train.arff \ -l [removed path]/model1.model -p 0 > \ [removed path]/predictions
我不能在这里使用“-num-decimal places”开关,因为WEKA在这种情况下不允许这样做,原因不明。“predictions”是我的目标预测文件。
所以我执行这两个命令,但它并没有改变预测中的小数位数!它仍然只有3位小数。
我已经查看了这些答案,Weka小数精度,以及pentaho论坛上的这个答案,但没有人提供足够的信息来回答我的问题。这些答案暗示改变小数位数可能是不可能的?但我只是想确定一下。
有谁知道有什么选项可以解决这个问题吗?理想的解决方案是在命令行上,但如果您只知道如何在GUI中操作,那也可以。
回答:
我刚刚想到了一个解决方法,就是简单地将数据乘以1000,然后进行预测,完成后再乘以1/1000来恢复到原始比例。这有点出人意料,但它有效。
编辑:另一种方法来自Peter Reutemann的回答,来自http://weka.8497.n7.nabble.com/Changing-decimal-point-precision-td43393.html:
这已经存在很长时间了。 😉 “-p”是输出预测的非常老式的办法。使用“-classifications”选项,您可以指定输出格式(例如CSV)。您用该选项指定的类必须派生自“weka.classifiers.evaluation.output.prediction.AbstractOutput”: http://weka.sourceforge.net/doc.dev/weka/classifiers/evaluation/output/prediction/AbstractOutput.html
以下是使用Java输出预测时使用12位小数的示例: https://svn.cms.waikato.ac.nz/svn/weka/trunk/wekaexamples/src/main/java/wekaexamples/classifiers/PredictionDecimals.java