Training a new Pylaia Model

dmelo · October 24, 2024, 11:06am

Dear Teklia,
After training a new model based on Pylaia to improve Pylaia transcriptions on our handwritten documents (previously annotated with Callico), the created model cannot predict transcriptions for similar documents.
We followed the tutorial for training on a transcription model and we cannot identify why the new model fails to obtain line transcriptions. We appreciate any help on the subject.
Below we present the details of the processes of training and testing.

Project Teste-treino1 – 14 documents annotated with Callico
→ Dataset – Dataset Pylaia 1
test – 2 documents
train – 10 documents
val – 2 documents

1.a) Train Process on Dataset Pylaia 1 (process ID: 7f52846a-24db-4cee-abce-f2427bb218fc)
→ Dataset – Dataset Pylaia 1
train – 10 documents
val – 2 documents

→ Train Process on Dataset Pylaia 1
Worker – PyLaia Training 81f98bdb-91f8-43f9-944f-c56c9b213557 Created 2024-09-12 12:28:33
Model Version – PyLaia Hugin Munin 7fb65e13-53c9-4583-a98a-a3b486e1b5d2 Configuration – Copy of Pylaia d14 (configuração) – Model ID: 57092e8a-c12e-4c8b-b1cc-93d6feb4d78f

1.b) Testing the training model (process ID: 632f1382-bda4-463e-a252-8aca537b0dcc)
→ Dataset – Dataset Pylaia 1
test – 2 documents

→ Train Process on Dataset Pylaia 1
Worker – PyLaia Inference c5a192af-b3a3-4042-b628-c7fb541b9f8d (Created 23 days ago) Model Version – Pylaia (modelo final) – Model ID: 7b3a4afc-9605-4c53-802d-ee04c5705f70 Configuration – none

Thank you,
Dora

Yoann.Schneider · October 29, 2024, 10:42am

Hi Dora,

Thanks for trying out our tutorial.

I noticed an issue in your training configuration. Since the dataset is very small, we recommend freezing some of the layers of the model. We document it in the tutorial.

Again, your dataset is very small, so the performance of the model will be limited.

Good luck with your experiments!

Yoann Schneider