Dear Solène, dear Teklia team,
My very best wishes for 2025!
I was able to follow the instructions you suggested for fine-tuning the Belfort model. For this second experiment, I run the following command:
pylaia-htr-train-ctc --config config_train_model_belfort.yaml --common.experiment_dirname experiment_belfort --common.checkpoint belfort_weights.ckpt --train.pretrain true --trainer.max_epochs 200
in the same working execution environnement as for my first experiment (in which I trained from scratch) and adapted a few lines in the configuration file as you mentioned (see below), but I obtained the following code error (see below).
Do you know what caused the error?
Thanks in advance,
best regards,
Carmen
Error in console:
[2025-01-07 18:38:56,127 INFO laia] Arguments: {'syms': '/home/geomatique/EW/env_atr_gen/data/syms.txt', 'img_dirs': ['/home/geomatique/EW/env_atr_gen/data/images/'], 'tr_txt_table': '/home/geomatique/EW/env_atr_gen/data/train.txt', 'va_txt_table': '/home/geomatique/EW/env_atr_gen/data/val.txt', 'common': CommonArgs(seed=74565, train_path='', model_filename='model', experiment_dirname='experiment_belfort/', monitor=<Monitor.va_cer: 'va_cer'>, checkpoint='belfort_weights.ckpt'), 'data': DataArgs(batch_size=8, color_mode=<ColorMode.L: 'L'>, num_workers=None, reading_order=<ReadingOrder.LTR: 'LTR'>), 'train': TrainArgs(delimiters=['<space>'], checkpoint_k=3, resume=False, pretrain=True, freeze_layers=[], early_stopping_patience=80, gpu_stats=False, augment_training=True), 'optimizer': OptimizerArgs(name=<Name.RMSProp: 'RMSProp'>, learning_rate=0.0005, momentum=0.0, weight_l2_penalty=0.0, nesterov=False), 'scheduler': SchedulerArgs(active=True, monitor=<Monitor.va_loss: 'va_loss'>, patience=5, factor=0.1), 'trainer': TrainerArgs(gradient_clip_val=0.0, gradient_clip_algorithm='norm', process_position=0, num_nodes=1, num_processes=1, devices=None, gpus=1, auto_select_gpus=True, tpu_cores=None, ipus=None, progress_bar_refresh_rate=1, overfit_batches=0.0, track_grad_norm=-1, check_val_every_n_epoch=1, fast_dev_run=False, accumulate_grad_batches=1, max_epochs=200, min_epochs=None, max_steps=None, min_steps=None, max_time=None, limit_train_batches=1.0, limit_val_batches=1.0, limit_test_batches=1.0, limit_predict_batches=1.0, val_check_interval=1.0, flush_logs_every_n_steps=100, log_every_n_steps=50, accelerator=None, sync_batchnorm=False, precision=32, weights_summary='top', weights_save_path=None, num_sanity_val_steps=2, truncated_bptt_steps=None, profiler=None, benchmark=False, deterministic=False, reload_dataloaders_every_n_epochs=0, reload_dataloaders_every_epoch=False, replace_sampler_ddp=True, terminate_on_nan=False, prepare_data_per_node=True, plugins=None, amp_backend='native', amp_level='O2', distributed_backend=None, move_metrics_to_cpu=False, multiple_trainloader_mode='max_size_cycle', stochastic_weight_avg=False), 'decode': DecodeArgs(include_img_ids=True, separator=' ', join_string=' ', use_symbols=True, convert_spaces=False, input_space='<space>', output_space=' ', segmentation=None, temperature=1.0, print_line_confidence_scores=False, print_word_confidence_scores=False, use_language_model=False, language_model_path=None, language_model_weight=None, tokens_path=None, lexicon_path=None, unk_token='<unk>', blank_token='<ctc>')}
[2025-01-07 18:38:56,366 INFO laia] Installed:
[2025-01-07 18:38:56,368 WARNING laia.common.loader] The file experiment_belfort/belfort_weights.ckpt has been moved to experiment_belfort/pretrained/belfort_weights.ckpt.
[2025-01-07 18:38:56,394 INFO laia.common.loader] Loaded model model
[2025-01-07 18:38:56,394 CRITICAL laia] Uncaught exception:
Traceback (most recent call last):
File "/home/geomatique/pylaia-env/bin/pylaia-htr-train-ctc", line 8, in <module>
sys.exit(main())
File "/home/geomatique/pylaia-env/lib/python3.10/site-packages/laia/scripts/htr/train_ctc.py", line 246, in main
run(**args)
File "/home/geomatique/pylaia-env/lib/python3.10/site-packages/laia/scripts/htr/train_ctc.py", line 78, in run
checkpoint_path = loader.reset_parameters(
File "/home/geomatique/pylaia-env/lib/python3.10/site-packages/laia/common/loader.py", line 177, in reset_parameters
checkpoint = torch.load(checkpoint_path, map_location=torch.device("cpu"))
File "/home/geomatique/pylaia-env/lib/python3.10/site-packages/torch/serialization.py", line 795, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/geomatique/pylaia-env/lib/python3.10/site-packages/torch/serialization.py", line 1002, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.```
Content of config_train_model_belfort.yaml:
syms: /home/geomatique/EW/env_atr_gen/data/syms.txt
img_dirs:
- /home/geomatique/EW/env_atr_gen/data/images/
tr_txt_table: /home/geomatique/EW/env_atr_gen/data/train.txt
va_txt_table: /home/geomatique/EW/env_atr_gen/data/val.txt
common:
experiment_dirname: experiment_belfort
logging:
filepath: pylaia_training_07012025.log
scheduler:
active: true
train:
augment_training: true
early_stopping_patience: 80
trainer:
auto_select_gpus: true
gpus: 1
max_epochs: 600