Format datasets and add datasets documentation for doc-ufcn

How about adding a script to format a dataset of images + polygons to a formatted doc-ufcn dataset?

How about adding a script to format a ground truth in PAGE xml and/or ALTO xml comprising images + xml files, extract the lines, cut the lines from the image files and format a doc-ufcn dataset?

doc-ufcn dataset formatting documentation, unfortunatelly, is missing, especially nothing is clear about classes_colors of classes_names.
Personally I wrote some scripts that do that (raw images + polygons to doc-ufcn, ALTO XML and PAGE XML to doc-ufcn) even though I’m not sure if I did it right.
Just a thought.

Teodor Bors

Initially written in Format datasets and add datasets documentation for doc-ufcn (#46) · Issues · Document Layout Analysis / Doc-UFCN · GitLab