OpenNLP分类器输出

目前我使用以下代码来训练分类器模型:

    final String iterations = "1000";    final String cutoff = "0";    InputStreamFactory dataIn = new MarkableFileInputStreamFactory(new File("src/main/resources/trainingSets/classifierA.txt"));    ObjectStream<String> lineStream = new PlainTextByLineStream(dataIn, "UTF-8");    ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);    TrainingParameters params = new TrainingParameters();    params.put(TrainingParameters.ITERATIONS_PARAM, iterations);    params.put(TrainingParameters.CUTOFF_PARAM, cutoff);    params.put(AbstractTrainer.ALGORITHM_PARAM, NaiveBayesTrainer.NAIVE_BAYES_VALUE);    DoccatModel model = DocumentCategorizerME.train("NL", sampleStream, params, new DoccatFactory());    OutputStream modelOut = new BufferedOutputStream(new FileOutputStream("src/main/resources/models/model.bin"));    model.serialize(modelOut);    return model;

运行顺利,每次运行后我得到以下输出:

    Indexing events with TwoPass using cutoff of 0    Computing event counts...  done. 1474 events    Indexing...  done.Collecting events... Done indexing in 0,03 s.Incorporating indexed data for training...  done.    Number of Event Tokens: 1474        Number of Outcomes: 2      Number of Predicates: 4149Computing model parameters...Stats: (998/1474) 0.6770691994572592...done.

能否有人解释一下这个输出的含义?它是否说明了准确率?


回答:

查看源码,我们可以看出这个输出是由NaiveBayesTrainer::trainModel方法生成的:

public AbstractModel trainModel(DataIndexer di) {    // ...    display("done.\n");    display("\tNumber of Event Tokens: " + numUniqueEvents + "\n");    display("\t    Number of Outcomes: " + numOutcomes + "\n");    display("\t  Number of Predicates: " + numPreds + "\n");    display("Computing model parameters...\n");    MutableContext[] finalParameters = findParameters();    display("...done.\n");    // ...}

如果你查看findParameters()的代码,会注意到它调用了trainingStats()方法,其中包含计算准确率的代码片段:

private double trainingStats(EvalParameters evalParams) {    // ...    double trainingAccuracy = (double) numCorrect / numEvents;    display("Stats: (" + numCorrect + "/" + numEvents + ") " + trainingAccuracy + "\n");    return trainingAccuracy;}

TL;DR 输出中的Stats: (998/1474) 0.6770691994572592部分就是你要找的准确率。

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注