如何在我的自定义数据上训练EAST文本检测器?网上没有博客详细展示这一过程。我目前拥有的是:
我有一个文件夹,里面包含了所有图像和每个图像对应的xml文件,这些xml文件标注了文本的位置。
例如:
<annotation> <folder>Dataset</folder> <filename>FFDDAPMDD1.png</filename> <path>C:\Users\HPO2KOR\Desktop\Work\venv\Patent\Dataset\Dataset\FFDDAPMDD1.png</path> <source> <database>Unknown</database> </source> <size> <width>839</width> <height>1000</height> <depth>3</depth> </size> <segmented>0</segmented> <object> <name>text</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> <bndbox> <xmin>522</xmin> <ymin>29</ymin> <xmax>536</xmax> <ymax>52</ymax> </bndbox> </object> <object> <name>text</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> <bndbox> <xmin>510</xmin> <ymin>258</ymin> <xmax>521</xmax> <ymax>281</ymax> </bndbox> </object> <object> <name>text</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> <bndbox> <xmin>546</xmin> <ymin>528</ymin> <xmax>581</xmax> <ymax>555</ymax> </bndbox> </object> <object> <name>text</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> <bndbox> <xmin>523</xmin> <ymin>646</ymin> <xmax>555</xmax> <ymax>674</ymax> </bndbox> </object> <object> <name>text</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> <bndbox> <xmin>410</xmin> <ymin>748</ymin> <xmax>447</xmax> <ymax>776</ymax> </bndbox> </object> <object> <name>text</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> <bndbox> <xmin>536</xmin> <ymin>826</ymin> <xmax>567</xmax> <ymax>851</ymax> </bndbox> </object> <object> <name>text</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> <bndbox> <xmin>792</xmin> <ymin>918</ymin> <xmax>838</xmax> <ymax>945</ymax> </bndbox> </object></annotation>
此外,我还为每张图像解析了xml文件,格式与训练YOLO模型时使用的一致。
例如
C:\Users\HPO2KOR\...\text\FFDDAPMDD1.png 522,29,536,52,0 510,258,521,281,0 546,528,581,555,0 523,646,555,674,0 410,748,447,776,0 536,826,567,851,0 792,918,838,945,0 660,918,706,943,0 63,1,108,24,0 65,51,110,77,0 65,101,109,126,0 63,151,110,175,0 63,202,109,228,0 63,252,110,276,0 63,303,110,330,0 62,353,110,381,0 65,405,109,434,0 90,457,110,482,0 59,505,101,534,0 64,565,107,590,0 61,616,107,644,0 62,670,103,694,0 62,725,104,753,0 63,778,104,804,0 62,831,100,857,0 87,887,106,912,0 98,919,144,943,0 240,916,284,943,0 378,915,420,943,0 520,918,565,942,0C:\Users\HPO2KOR\...\text\FFDDAPMDD2.png 91,145,109,171,0 68,192,106,218,0 92,239,111,265,0 69,286,108,311,0 92,333,107,357,0 66,379,110,405,0 90,424,111,451,0 69,472,107,497,0 91,518,109,545,0 66,564,109,590,0 90,613,110,637,0 121,644,140,670,0 279,643,322,671,0 446,645,490,668,0 615,642,661,669,0 786,643,831,667,0 954,643,997,672,0 820,22,866,50,0 823,73,866,103,0C:\Users\HPO2KOR\...\text\FFDDAPMDD3.png 648,1,698,30,0 68,64,129,91,0 55,144,128,168,0 70,218,129,247,0 56,300,127,326,0 71,377,125,404,0 58,459,127,482,0 109,535,130,560,0 140,568,160,594,0 344,568,382,594,0 563,566,581,591,0 760,568,800,593,0 982,569,1000,591,0
在Windows上训练这个EAST文本检测器的步骤是什么?
回答:
根据自述文件中的文档,使用keras实现的EAST进行自定义训练,需要一个包含图像的文件夹,每张图像需要一个同名的文本文件,文件名为gt_IMAGENAME.txt
(将IMAGENAME替换为对应的图像名称)。
在每个文本文件中,“真实标注以单独的文本文件形式提供(每张图像一个文件),每行指定一个单词的边界框坐标和其转录,以逗号分隔的格式。”这句话引用自https://rrc.cvc.uab.es/?ch=4&com=tasks,该链接在自述文件中指向tensorflow实现的EAST,位于https://github.com/argman/EAST。边界框以四个角的坐标表示。
您似乎已经拥有构建正确格式训练数据所需的所有信息。可能有工具可以转换所有数据,但一个简单的python脚本也能很好地完成任务。例如…
- 遍历所有xml文件
- 为每个xml文件创建一个按文档要求命名的文本文件
- 使用BeautifulSoup解析xml
- 使用find_all获取所有
object
标签 - 使用
xmin
、xmax
、ymin
和ymax
的值来表示所有角的x,y坐标。(左上角是xmin,ymax;右上角是xmax,ymax;依此类推)。根据https://github.com/argman/EAST/blob/master/training_samples/img_1.txt,顺序似乎是左下角,右下角,右上角,左上角
- 对于每个object标签,在文本文件中按以下格式写入新行:
x1, y1, x2, y2, x3, y3, x4, y4, transcription
或x1, y1, x2, y2, x3, y3, x4, y4, ###
(后跟\n
换行) - 运行
python train.py
,将所有命令行参数设置为“执行示例”中的方式,但将--training_data_path=
后的值更改为您的路径