在自定义数据上训练EAST文本检测器

如何在我的自定义数据上训练EAST文本检测器？网上没有博客详细展示这一过程。我目前拥有的是：

我有一个文件夹，里面包含了所有图像和每个图像对应的xml文件，这些xml文件标注了文本的位置。

例如：

<annotation>    <folder>Dataset</folder>    <filename>FFDDAPMDD1.png</filename>    <path>C:\Users\HPO2KOR\Desktop\Work\venv\Patent\Dataset\Dataset\FFDDAPMDD1.png</path>    <source>        <database>Unknown</database>    </source>    <size>        <width>839</width>        <height>1000</height>        <depth>3</depth>    </size>    <segmented>0</segmented>    <object>        <name>text</name>        <pose>Unspecified</pose>        <truncated>0</truncated>        <difficult>0</difficult>        <bndbox>            <xmin>522</xmin>            <ymin>29</ymin>            <xmax>536</xmax>            <ymax>52</ymax>        </bndbox>    </object>    <object>        <name>text</name>        <pose>Unspecified</pose>        <truncated>0</truncated>        <difficult>0</difficult>        <bndbox>            <xmin>510</xmin>            <ymin>258</ymin>            <xmax>521</xmax>            <ymax>281</ymax>        </bndbox>    </object>    <object>        <name>text</name>        <pose>Unspecified</pose>        <truncated>0</truncated>        <difficult>0</difficult>        <bndbox>            <xmin>546</xmin>            <ymin>528</ymin>            <xmax>581</xmax>            <ymax>555</ymax>        </bndbox>    </object>    <object>        <name>text</name>        <pose>Unspecified</pose>        <truncated>0</truncated>        <difficult>0</difficult>        <bndbox>            <xmin>523</xmin>            <ymin>646</ymin>            <xmax>555</xmax>            <ymax>674</ymax>        </bndbox>    </object>    <object>        <name>text</name>        <pose>Unspecified</pose>        <truncated>0</truncated>        <difficult>0</difficult>        <bndbox>            <xmin>410</xmin>            <ymin>748</ymin>            <xmax>447</xmax>            <ymax>776</ymax>        </bndbox>    </object>    <object>        <name>text</name>        <pose>Unspecified</pose>        <truncated>0</truncated>        <difficult>0</difficult>        <bndbox>            <xmin>536</xmin>            <ymin>826</ymin>            <xmax>567</xmax>            <ymax>851</ymax>        </bndbox>    </object>    <object>        <name>text</name>        <pose>Unspecified</pose>        <truncated>0</truncated>        <difficult>0</difficult>        <bndbox>            <xmin>792</xmin>            <ymin>918</ymin>            <xmax>838</xmax>            <ymax>945</ymax>        </bndbox>    </object></annotation>

此外，我还为每张图像解析了xml文件，格式与训练YOLO模型时使用的一致。

例如

C:\Users\HPO2KOR\...\text\FFDDAPMDD1.png 522,29,536,52,0 510,258,521,281,0 546,528,581,555,0 523,646,555,674,0 410,748,447,776,0 536,826,567,851,0 792,918,838,945,0 660,918,706,943,0 63,1,108,24,0 65,51,110,77,0 65,101,109,126,0 63,151,110,175,0 63,202,109,228,0 63,252,110,276,0 63,303,110,330,0 62,353,110,381,0 65,405,109,434,0 90,457,110,482,0 59,505,101,534,0 64,565,107,590,0 61,616,107,644,0 62,670,103,694,0 62,725,104,753,0 63,778,104,804,0 62,831,100,857,0 87,887,106,912,0 98,919,144,943,0 240,916,284,943,0 378,915,420,943,0 520,918,565,942,0C:\Users\HPO2KOR\...\text\FFDDAPMDD2.png 91,145,109,171,0 68,192,106,218,0 92,239,111,265,0 69,286,108,311,0 92,333,107,357,0 66,379,110,405,0 90,424,111,451,0 69,472,107,497,0 91,518,109,545,0 66,564,109,590,0 90,613,110,637,0 121,644,140,670,0 279,643,322,671,0 446,645,490,668,0 615,642,661,669,0 786,643,831,667,0 954,643,997,672,0 820,22,866,50,0 823,73,866,103,0C:\Users\HPO2KOR\...\text\FFDDAPMDD3.png 648,1,698,30,0 68,64,129,91,0 55,144,128,168,0 70,218,129,247,0 56,300,127,326,0 71,377,125,404,0 58,459,127,482,0 109,535,130,560,0 140,568,160,594,0 344,568,382,594,0 563,566,581,591,0 760,568,800,593,0 982,569,1000,591,0

在Windows上训练这个EAST文本检测器的步骤是什么？

回答：

根据自述文件中的文档，使用keras实现的EAST进行自定义训练，需要一个包含图像的文件夹，每张图像需要一个同名的文本文件，文件名为gt_IMAGENAME.txt（将IMAGENAME替换为对应的图像名称）。

在每个文本文件中，“真实标注以单独的文本文件形式提供（每张图像一个文件），每行指定一个单词的边界框坐标和其转录，以逗号分隔的格式。”这句话引用自https://rrc.cvc.uab.es/?ch=4&com=tasks，该链接在自述文件中指向tensorflow实现的EAST，位于https://github.com/argman/EAST。边界框以四个角的坐标表示。

您似乎已经拥有构建正确格式训练数据所需的所有信息。可能有工具可以转换所有数据，但一个简单的python脚本也能很好地完成任务。例如…

遍历所有xml文件
为每个xml文件创建一个按文档要求命名的文本文件
使用BeautifulSoup解析xml
使用find_all获取所有object标签
使用xmin、xmax、ymin和ymax的值来表示所有角的x，y坐标。（左上角是xmin，ymax；右上角是xmax，ymax；依此类推）。根据https://github.com/argman/EAST/blob/master/training_samples/img_1.txt，顺序似乎是左下角，右下角，右上角，左上角
对于每个object标签，在文本文件中按以下格式写入新行：x1, y1, x2, y2, x3, y3, x4, y4, transcription或x1, y1, x2, y2, x3, y3, x4, y4, ###（后跟\n换行）
运行python train.py，将所有命令行参数设置为“执行示例”中的方式，但将--training_data_path=后的值更改为您的路径

学技术

在自定义数据上训练EAST文本检测器

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复