Help with parameters to fine tune model

Hi Teklia team,

I would like to fine tune this model (ArkIndex 1.7.0) from the Belfort archives project using these instructions:
Training - PyLaia, normally from this command:
pylaia-htr-train-ctc --config config_train_model.yaml --common.experiment_dirname experiment/ --common.checkpoint initial_checkpoint.ckpt --train.pretrain true --trainer.max_epochs 200

I’m not sure about the common.experiment_dirname parameter and the ‘experiment’ directory. Is this the folder where I will find the weights of the model on which I would like to perform fine tuning?
If so, how do I retrieve it from Arkindex? With the Arkindex CLI, how?
Or is it the folder that was generated in my training ‘from scratch’ from the data in my project (Eugène Wilhem), and in that case how do I tell it where the Belfort model is?

Thanks in advance for your help!
best regards,

Hi again Carmen,

First, your experiment/ directory corresponds to your previous experiment (e.g. no fine-tuning). As you will be running a new experiment, I recommend you create a new folder experiment_belfort/ and update your training configuration accordingly.

Here are the steps to fine-tune the Belfort model on your dataset:

  • Step 1: Download the model from Arkindex, you will need to click on the download button as shown in this screenshot:

  • Step 2: Then, untar the archive with the following command:

tar --use-compress-program=unzstd -xvf 140d18b5-c952-4e6f-a511-185349b7a9df.tar.zst
  • Step 3: Rename the file named weights.ckpt to belfort_weights.ckpt and move it to your experiment_belfort/ directory.
  • Step 4: Start training with pylaia-htr-train-ctc --config config_train_model_belfort.yaml --common.experiment_dirname experiment_belfort/ --common.checkpoint belfort_weights.ckpt --train.pretrain true --trainer.max_epochs 200

I hope this helps. Best of luck!

Solène

1 Like

Hi Solène,
Thanks a lot for your answers, this is very helpful!
But I don’t have a download button available, maybe I don’t have the right privileges to access the model?
best,
Carmen

Sorry about that, I thought this model was publicly available in Arkindex. Fortunately, there is another (easier) way, as the same model is available in HuggingFace at Teklia/pylaia-belfort · Hugging Face.

Running git clone https://huggingface.co/Teklia/pylaia-belfort will download everything you need. You can then go directly to Step 3.

Solène

1 Like

Dear Solène, dear Teklia team,

My very best wishes for 2025!

I was able to follow the instructions you suggested for fine-tuning the Belfort model. For this second experiment, I run the following command:

pylaia-htr-train-ctc --config config_train_model_belfort.yaml --common.experiment_dirname experiment_belfort --common.checkpoint belfort_weights.ckpt --train.pretrain true --trainer.max_epochs 200

in the same working execution environnement as for my first experiment (in which I trained from scratch) and adapted a few lines in the configuration file as you mentioned (see below), but I obtained the following code error (see below).

Do you know what caused the error?
Thanks in advance,
best regards,
Carmen

Error in console:

[2025-01-07 18:38:56,127 INFO laia] Arguments: {'syms': '/home/geomatique/EW/env_atr_gen/data/syms.txt', 'img_dirs': ['/home/geomatique/EW/env_atr_gen/data/images/'], 'tr_txt_table': '/home/geomatique/EW/env_atr_gen/data/train.txt', 'va_txt_table': '/home/geomatique/EW/env_atr_gen/data/val.txt', 'common': CommonArgs(seed=74565, train_path='', model_filename='model', experiment_dirname='experiment_belfort/', monitor=<Monitor.va_cer: 'va_cer'>, checkpoint='belfort_weights.ckpt'), 'data': DataArgs(batch_size=8, color_mode=<ColorMode.L: 'L'>, num_workers=None, reading_order=<ReadingOrder.LTR: 'LTR'>), 'train': TrainArgs(delimiters=['<space>'], checkpoint_k=3, resume=False, pretrain=True, freeze_layers=[], early_stopping_patience=80, gpu_stats=False, augment_training=True), 'optimizer': OptimizerArgs(name=<Name.RMSProp: 'RMSProp'>, learning_rate=0.0005, momentum=0.0, weight_l2_penalty=0.0, nesterov=False), 'scheduler': SchedulerArgs(active=True, monitor=<Monitor.va_loss: 'va_loss'>, patience=5, factor=0.1), 'trainer': TrainerArgs(gradient_clip_val=0.0, gradient_clip_algorithm='norm', process_position=0, num_nodes=1, num_processes=1, devices=None, gpus=1, auto_select_gpus=True, tpu_cores=None, ipus=None, progress_bar_refresh_rate=1, overfit_batches=0.0, track_grad_norm=-1, check_val_every_n_epoch=1, fast_dev_run=False, accumulate_grad_batches=1, max_epochs=200, min_epochs=None, max_steps=None, min_steps=None, max_time=None, limit_train_batches=1.0, limit_val_batches=1.0, limit_test_batches=1.0, limit_predict_batches=1.0, val_check_interval=1.0, flush_logs_every_n_steps=100, log_every_n_steps=50, accelerator=None, sync_batchnorm=False, precision=32, weights_summary='top', weights_save_path=None, num_sanity_val_steps=2, truncated_bptt_steps=None, profiler=None, benchmark=False, deterministic=False, reload_dataloaders_every_n_epochs=0, reload_dataloaders_every_epoch=False, replace_sampler_ddp=True, terminate_on_nan=False, prepare_data_per_node=True, plugins=None, amp_backend='native', amp_level='O2', distributed_backend=None, move_metrics_to_cpu=False, multiple_trainloader_mode='max_size_cycle', stochastic_weight_avg=False), 'decode': DecodeArgs(include_img_ids=True, separator=' ', join_string=' ', use_symbols=True, convert_spaces=False, input_space='<space>', output_space=' ', segmentation=None, temperature=1.0, print_line_confidence_scores=False, print_word_confidence_scores=False, use_language_model=False, language_model_path=None, language_model_weight=None, tokens_path=None, lexicon_path=None, unk_token='<unk>', blank_token='<ctc>')}
[2025-01-07 18:38:56,366 INFO laia] Installed:
[2025-01-07 18:38:56,368 WARNING laia.common.loader] The file experiment_belfort/belfort_weights.ckpt has been moved to experiment_belfort/pretrained/belfort_weights.ckpt.
[2025-01-07 18:38:56,394 INFO laia.common.loader] Loaded model model
[2025-01-07 18:38:56,394 CRITICAL laia] Uncaught exception:
Traceback (most recent call last):
  File "/home/geomatique/pylaia-env/bin/pylaia-htr-train-ctc", line 8, in <module>
    sys.exit(main())
  File "/home/geomatique/pylaia-env/lib/python3.10/site-packages/laia/scripts/htr/train_ctc.py", line 246, in main
    run(**args)
  File "/home/geomatique/pylaia-env/lib/python3.10/site-packages/laia/scripts/htr/train_ctc.py", line 78, in run
    checkpoint_path = loader.reset_parameters(
  File "/home/geomatique/pylaia-env/lib/python3.10/site-packages/laia/common/loader.py", line 177, in reset_parameters
    checkpoint = torch.load(checkpoint_path, map_location=torch.device("cpu"))
  File "/home/geomatique/pylaia-env/lib/python3.10/site-packages/torch/serialization.py", line 795, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/geomatique/pylaia-env/lib/python3.10/site-packages/torch/serialization.py", line 1002, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.```

Content of config_train_model_belfort.yaml:

syms: /home/geomatique/EW/env_atr_gen/data/syms.txt
img_dirs:
  - /home/geomatique/EW/env_atr_gen/data/images/
tr_txt_table: /home/geomatique/EW/env_atr_gen/data/train.txt
va_txt_table: /home/geomatique/EW/env_atr_gen/data/val.txt
common:
  experiment_dirname: experiment_belfort
logging:
  filepath: pylaia_training_07012025.log
scheduler:
  active: true
train:
  augment_training: true
  early_stopping_patience: 80
trainer:
  auto_select_gpus: true
  gpus: 1
  max_epochs: 600

Dear all,
I’ve found the source of the problem, for information the weights file (weights.ckpt) in the dataset was corrupted when the repository was cloned, I’ve downloaded it again and the command works.
Best,

1 Like