OpenNLP分类器输出

目前我使用以下代码来训练分类器模型:

    final String iterations = "1000";    final String cutoff = "0";    InputStreamFactory dataIn = new MarkableFileInputStreamFactory(new File("src/main/resources/trainingSets/classifierA.txt"));    ObjectStream<String> lineStream = new PlainTextByLineStream(dataIn, "UTF-8");    ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);    TrainingParameters params = new TrainingParameters();    params.put(TrainingParameters.ITERATIONS_PARAM, iterations);    params.put(TrainingParameters.CUTOFF_PARAM, cutoff);    params.put(AbstractTrainer.ALGORITHM_PARAM, NaiveBayesTrainer.NAIVE_BAYES_VALUE);    DoccatModel model = DocumentCategorizerME.train("NL", sampleStream, params, new DoccatFactory());    OutputStream modelOut = new BufferedOutputStream(new FileOutputStream("src/main/resources/models/model.bin"));    model.serialize(modelOut);    return model;

运行顺利,每次运行后我得到以下输出:

    Indexing events with TwoPass using cutoff of 0    Computing event counts...  done. 1474 events    Indexing...  done.Collecting events... Done indexing in 0,03 s.Incorporating indexed data for training...  done.    Number of Event Tokens: 1474        Number of Outcomes: 2      Number of Predicates: 4149Computing model parameters...Stats: (998/1474) 0.6770691994572592...done.

能否有人解释一下这个输出的含义?它是否说明了准确率?


回答:

查看源码,我们可以看出这个输出是由NaiveBayesTrainer::trainModel方法生成的:

public AbstractModel trainModel(DataIndexer di) {    // ...    display("done.\n");    display("\tNumber of Event Tokens: " + numUniqueEvents + "\n");    display("\t    Number of Outcomes: " + numOutcomes + "\n");    display("\t  Number of Predicates: " + numPreds + "\n");    display("Computing model parameters...\n");    MutableContext[] finalParameters = findParameters();    display("...done.\n");    // ...}

如果你查看findParameters()的代码,会注意到它调用了trainingStats()方法,其中包含计算准确率的代码片段:

private double trainingStats(EvalParameters evalParams) {    // ...    double trainingAccuracy = (double) numCorrect / numEvents;    display("Stats: (" + numCorrect + "/" + numEvents + ") " + trainingAccuracy + "\n");    return trainingAccuracy;}

TL;DR 输出中的Stats: (998/1474) 0.6770691994572592部分就是你要找的准确率。

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注