如何获取DictionaryAnnotator的注释文本

我使用UIMA的DictionaryCreator创建了一个字典,我想使用DictionaryAnnotator和上述字典对一段文本进行注释,但我无法弄清楚如何获取注释后的文本。如果你知道,请告诉我。任何帮助都将不胜感激。代码、字典文件和描述符如下所示,附注:我是Apache UIMA的新手。

 XMLInputSource xml_in = new XMLInputSource("DictionaryAnnotatorDescriptor.xml");         ResourceSpecifier specifier = UIMAFramework.getXMLParser().parseResourceSpecifier(xml_in);         AnalysisEngine ae = UIMAFramework.produceAnalysisEngine(specifier);         JCas jCas = ae.newJCas();         String inputText = "Mark and John went down the rabbit hole to meet a wise owl and have curry with the owl.";         jCas.setDocumentText(inputText);         printResults(jCas);public static void printResults(JCas jcas) {    FSIndex<Annotation> index = jcas.getAnnotationIndex();    for (Iterator<Annotation> it = index.iterator(); it.hasNext(); ) {        Annotation annotation = it.next();        List<Feature> features;            features = annotation.getType().getFeatures();        List<String> fasl = new ArrayList<String>();        for (Feature feature : features) {            try {                String name = feature.getShortName();                System.out.println(feature.getName());                String value = annotation.getStringValue(feature);                fasl.add(name + "=\"" + value + "\"");                System.out.println(value);            }catch (Exception e){                continue;            }        }    }}my_dictionary.xml<?xml version="1.0" encoding="UTF-8"?><dictionary xmlns="http://incubator.apache.org/uima" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="dictionary.xsd"><typeCollection><dictionaryMetaData caseNormalization="true" multiWordEntries="true" multiWordSeparator=" "/><languageId>en</languageId><typeDescription><typeName>org.apache.uima.DictionaryEntry</typeName></typeDescription><entries><entry><key>Mark</key></entry><entry><key>John</key></entry><entry><key>Rabbit</key></entry><entry><key>Owl</key></entry><entry><key>Curry</key></entry><entry><key>ATH-MX50</key></entry><entry><key>CC234</key></entry></entries></typeCollection></dictionary>DictionaryAnnotatorDescriptor.xml<?xml version="1.0" encoding="UTF-8"?><analysisEngineDescription xmlns="http://uima.apache.org/resourceSpecifier">    <frameworkImplementation>org.apache.uima.java</frameworkImplementation>    <primitive>true</primitive>    <annotatorImplementationName>org.apache.uima.annotator.dict_annot.impl.DictionaryAnnotator</annotatorImplementationName>    <analysisEngineMetaData>        <name>GeneDictionaryAnnotator</name>        <description></description>        <version>0.1</version>        <vendor></vendor>        <configurationParameters>            <configurationParameter>                <name>DictionaryFiles</name>                <description>list of dictionary files to configure the annotator</description>                <type>String</type>                <multiValued>true</multiValued>                <mandatory>true</mandatory>            </configurationParameter>            <configurationParameter>                <name>InputMatchType</name>                <description></description>                <type>String</type>                <multiValued>false</multiValued>                <mandatory>true</mandatory>            </configurationParameter>            <configurationParameter>                <name>InputMatchFeaturePath</name>                <description></description>                <type>String</type>                <multiValued>false</multiValued>                <mandatory>false</mandatory>            </configurationParameter>            <configurationParameter>                <name>InputMatchFilterFeaturePath</name>                <description></description>                <type>String</type>                <multiValued>false</multiValued>                <mandatory>false</mandatory>            </configurationParameter>            <configurationParameter>                <name>FilterConditionOperator</name>                <description></description>                <type>String</type>                <multiValued>false</multiValued>                <mandatory>false</mandatory>            </configurationParameter>            <configurationParameter>                <name>FilterConditionValue</name>                <description></description>                <type>String</type>                <multiValued>false</multiValued>                <mandatory>false</mandatory>            </configurationParameter>        </configurationParameters>        <configurationParameterSettings>            <nameValuePair>                <name>DictionaryFiles</name>                <value>                    <array>                        <string>src/main/resources/my_dictionary.xml</string>                    </array>                </value>            </nameValuePair>            <nameValuePair>                <name>InputMatchType</name>                <value>                    <string>org.apache.uima.TokenAnnotation</string>                </value>            </nameValuePair>        </configurationParameterSettings>        <typeSystemDescription>            <types>                <typeDescription>                    <name>org.apache.uima.DictionaryEntry</name>                    <description></description>                    <supertypeName>uima.tcas.Annotation</supertypeName>                </typeDescription>                <typeDescription>                    <name>org.apache.uima.TokenAnnotation</name>                    <description>Single token annotation</description>                    <supertypeName>uima.tcas.Annotation</supertypeName>                    <features>                        <featureDescription>                            <name>tokenType</name>                            <description>token type</description>                            <rangeTypeName>uima.cas.String</rangeTypeName>                        </featureDescription>                    </features>                </typeDescription>                <typeDescription>                    <name>example.Name</name>                    <description>A proper name.</description>                    <supertypeName>uima.tcas.Annotation</supertypeName>                </typeDescription>            </types>        </typeSystemDescription>        <capabilities>            <capability>                <inputs/>                <outputs>                    <type>example.Name</type>                </outputs>                <languagesSupported/>            </capability>        </capabilities>        <operationalProperties>            <modifiesCas>true</modifiesCas>            <multipleDeploymentAllowed>true</multipleDeploymentAllowed>            <outputsNewCASes>false</outputsNewCASes>        </operationalProperties>    </analysisEngineMetaData></analysisEngineDescription>

回答:

或者,您也可以使用Apache Ruta,可以使用工作台(推荐用于入门)或Java代码。

对于后者,我在https://github.com/renaud/annotate_ruta_example创建了一个示例项目。主要部分包括:

src/main/resources/ruta/resources/names.txt中的名称列表(一个纯文本文件)

MarkJohnRabbitOwlCurryATH-MX50CC234

src/main/resources/ruta/scripts/Example.ruta中的Ruta脚本

PACKAGE example.annotate;               // 可选的包定义WORDLIST MyNames = 'names.txt';         // 声明字典位置DECLARE Name;                           // 声明一个注释Document{-> MARKFAST(Name, MyNames)};   // 注释文档

以及一些用于启动注释器的Java样板代码:

JCas jCas = JCasFactory.createJCas();// 要注释的示例文本jCas.setDocumentText("Mark wants to buy CC234.");// 使用脚本和资源配置引擎AnalysisEngine rutaEngine = AnalysisEngineFactory.createEngine(    RutaEngine.class, //    RutaEngine.PARAM_RESOURCE_PATHS,    "src/main/resources/ruta/resources",//    RutaEngine.PARAM_SCRIPT_PATHS,    "src/main/resources/ruta/scripts",    RutaEngine.PARAM_MAIN_SCRIPT, "Example");// 运行脚本。您也可以提供一个UIMA集合读取器来处理多个文档,而不是jCasSimplePipeline.runPipeline(jCas, rutaEngine);// 一个简单的选择来打印匹配的名称for (Name name : JCasUtil.select(jCas, Name.class)) {    System.out.println(name.getCoveredText());}

还有一些UIMA类型(注释)定义,请查看src/main/resources/desc/type/ExampleTypes.xmlsrc/main/resources/META-INF/org.apache.uima.fit/types.txtsrc/main/java/example/annotate

如何测试

git clone https://github.com/renaud/annotate_ruta_example.gitcd annotate_ruta_examplemvn clean installmvn exec:java -Dexec.mainClass="example.Annotate"

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注