OpenNLP分类器输出

目前我使用以下代码来训练分类器模型:

    final String iterations = "1000";    final String cutoff = "0";    InputStreamFactory dataIn = new MarkableFileInputStreamFactory(new File("src/main/resources/trainingSets/classifierA.txt"));    ObjectStream<String> lineStream = new PlainTextByLineStream(dataIn, "UTF-8");    ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);    TrainingParameters params = new TrainingParameters();    params.put(TrainingParameters.ITERATIONS_PARAM, iterations);    params.put(TrainingParameters.CUTOFF_PARAM, cutoff);    params.put(AbstractTrainer.ALGORITHM_PARAM, NaiveBayesTrainer.NAIVE_BAYES_VALUE);    DoccatModel model = DocumentCategorizerME.train("NL", sampleStream, params, new DoccatFactory());    OutputStream modelOut = new BufferedOutputStream(new FileOutputStream("src/main/resources/models/model.bin"));    model.serialize(modelOut);    return model;

运行顺利,每次运行后我得到以下输出:

    Indexing events with TwoPass using cutoff of 0    Computing event counts...  done. 1474 events    Indexing...  done.Collecting events... Done indexing in 0,03 s.Incorporating indexed data for training...  done.    Number of Event Tokens: 1474        Number of Outcomes: 2      Number of Predicates: 4149Computing model parameters...Stats: (998/1474) 0.6770691994572592...done.

能否有人解释一下这个输出的含义?它是否说明了准确率?


回答:

查看源码,我们可以看出这个输出是由NaiveBayesTrainer::trainModel方法生成的:

public AbstractModel trainModel(DataIndexer di) {    // ...    display("done.\n");    display("\tNumber of Event Tokens: " + numUniqueEvents + "\n");    display("\t    Number of Outcomes: " + numOutcomes + "\n");    display("\t  Number of Predicates: " + numPreds + "\n");    display("Computing model parameters...\n");    MutableContext[] finalParameters = findParameters();    display("...done.\n");    // ...}

如果你查看findParameters()的代码,会注意到它调用了trainingStats()方法,其中包含计算准确率的代码片段:

private double trainingStats(EvalParameters evalParams) {    // ...    double trainingAccuracy = (double) numCorrect / numEvents;    display("Stats: (" + numCorrect + "/" + numEvents + ") " + trainingAccuracy + "\n");    return trainingAccuracy;}

TL;DR 输出中的Stats: (998/1474) 0.6770691994572592部分就是你要找的准确率。

Related Posts

Keras Dense层输入未被展平

这是我的测试代码: from keras import…

无法将分类变量输入随机森林

我有10个分类变量和3个数值变量。我在分割后直接将它们…

如何在Keras中对每个输出应用Sigmoid函数?

这是我代码的一部分。 model = Sequenti…

如何选择类概率的最佳阈值?

我的神经网络输出是一个用于多标签分类的预测类概率表: …

在Keras中使用深度学习得到不同的结果

我按照一个教程使用Keras中的深度神经网络进行文本分…

‘MatMul’操作的输入’b’类型为float32,与参数’a’的类型float64不匹配

我写了一个简单的TensorFlow代码,但不断遇到T…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注