在运行Stanford CoreNLP时，一些高性能计算集群是否只缓存一个结果？

我在一个Java项目中使用了Stanford CoreNLP库。我创建了一个名为StanfordNLP的类，并实例化了两个不同的对象，并使用不同的字符串作为参数初始化构造函数。我使用词性标注器来获取形容词-名词序列。然而，程序的输出只显示第一个对象的结果。每个StanfordNLP对象都用不同的字符串初始化，但每个对象返回的结果与第一个对象相同。我是Java新手，所以我无法判断是我的代码有问题，还是运行它的高性能计算集群有问题。

我尝试使用getter方法来代替从StanfordNLP类方法返回字符串列表。我还尝试将第一个StanfordNLP对象设置为null，使其不引用任何内容，然后创建其他对象。但这些方法都没有奏效。

/* in main */List<String> pos_tokens0 = new ArrayList<String>();List<String> pos_tokens1 = new ArrayList<String>();String text0 = "Mary little lamb white fleece like snow"StanfordNLP snlp0 = new StanfordNLP(text0);pos_tokens0 = snlp0.process();String text1 = "Everywhere little Mary went fluffy lamb ate green grass"StanfordNLP snlp1 = new StanfordNLP(text1);pos_tokens1 = snlp1.process();/* in StanfordNLP.java */public class StanfordNLP {    private static List<String> pos_adjnouns = new ArrayList<String>();    private String documentText = "";    public StanfordNLP() {}    public StanfordNLP(String text) { this.documentText = text; }    public List<String> process() {             Properties props = new Properties();        props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, depparse");        props.setProperty("coref.algorithm", "neural");        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);            Annotation document = new Annotation(documentText);        pipeline.annotate(document);        List<CoreMap> sentences = document.get(SentencesAnnotation.class);        List<String[]> corpus_temp = new ArrayList<String[]>();        int count = 0;            for(CoreMap sentence: sentences) {            for (CoreLabel token: sentence.get(TokensAnnotation.class)) {                String[] data = new String[2];                String word = token.get(TextAnnotation.class);                String pos = token.get(PartOfSpeechAnnotation.class);                count ++;                data[0] = word;                data[1] = pos;                         corpus_temp.add(data);            }                   }            String[][] corpus = corpus_temp.toArray(new String[count][2]);            // corpus contains string arrays with a word and its part-of-speech.        for (int i=0; i<(corpus.length-3); i++) {             String word = corpus[i][0];            String pos = corpus[i][1];            String word2 = corpus[i+1][0];            String pos2 = corpus[i+1][1];            // find adjectives and nouns (eg, "fast car")            if (pos.equals("JJ")) {                         if (pos2.equals("NN") || pos2.equals("NNP") || pos2.equals("NNPS")) {                    word = word + " " + word2;                    pos_adjnouns.add(word);                }            }        }        return pos_adjnouns;}

pos_tokens0的预期输出是”little lamb, white fleece”。pos_tokens1的预期输出是”little Mary, fluffy lamb, green grass”。但这两个变量的实际输出都是”little lamb, white fleece”。

你知道为什么会这样吗？我在一个高性能计算服务器上运行了一个简单的Java jar文件，包含main.java和myclass.java，无法复制这个问题。因此，看起来高性能计算服务器在处理同一个类的多个对象时没有问题。

回答：

问题看起来只是因为你的pos_adjnouns变量是static的，因此在所有StanfordNLP实例之间共享……尝试删除static关键字，看看是否能按你预期的那样工作。

但即使这样也还不对，因为你会有实例变量，并且在多次调用process()方法时，事情会不断添加到pos_adjnouns列表中。你还应该做两件事：

将pos_adjnouns变成process()方法中的方法变量
相反，初始化StanfordCoreNLP管道是昂贵的，所以你应该将其移出process()方法，并在类构造函数中进行。最好让事情恰恰相反，让构造函数初始化一个管道，而process()方法接受一个要分析的String。

学技术

在运行Stanford CoreNLP时，一些高性能计算集群是否只缓存一个结果？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复