Pix2StructCzechInvoice (V1 – Synthetic + Random Layout)

This model is a fine-tuned version of TomasFAV/Pix2StructCzechInvoice for structured information extraction from Czech invoices.

It achieves the following results on the evaluation set:

  • Loss: 0.4679
  • F1: 0.6432

Model description

Pix2StructCzechInvoice (V1) extends the baseline generative model by introducing layout variability into the training data.

Unlike token classification models, this model:

  • processes full document images
  • generates structured outputs as text sequences

It is trained to extract key invoice fields:

  • supplier
  • customer
  • invoice number
  • bank details
  • totals
  • dates

Training data

The dataset consists of:

  • synthetically generated invoice images
  • augmented variants with randomized layouts
  • corresponding structured text outputs

Key properties:

  • variable layout structure
  • visual diversity (spacing, positioning, formatting)
  • consistent annotation format
  • fully synthetic data

This introduces layout variability in the visual domain, which is crucial for generative multimodal models.


Role in the pipeline

This model corresponds to:

V1 – Synthetic templates + randomized layouts

It is used to:

  • evaluate the effect of layout variability on generative models
  • compare against:
    • V0 (fixed templates)
    • later hybrid and real-data stages (V2, V3)
  • analyze robustness of end-to-end extraction

Intended uses

  • End-to-end invoice extraction from images
  • Document VQA-style tasks
  • Research in generative document understanding
  • Comparison with structured prediction models

Limitations

  • Still trained only on synthetic data
  • Sensitive to output formatting inconsistencies
  • Training instability (fluctuating F1 across epochs)
  • Evaluation depends on string matching quality
  • Less interpretable than token classification models

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 1
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine_with_restarts
  • lr_scheduler_warmup_steps: 0.1
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss F1
0.1978 1.0 75 0.3757 0.5804
0.1031 2.0 150 0.3578 0.6399
0.0725 3.0 225 0.3504 0.6318
0.0512 4.0 300 0.3929 0.6396
0.0500 5.0 375 0.4072 0.6394
0.0462 6.0 450 0.4655 0.4377
0.0502 7.0 525 0.6320 0.3384
0.0528 8.0 600 0.4835 0.5018
0.0393 9.0 675 0.4679 0.6432
0.0392 10.0 750 0.5330 0.4931

Framework versions

  • Transformers 5.0.0
  • PyTorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
96
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TomasFAV/Pix2StructCzechInvoiceV01

Finetuned
(1)
this model