Multimodal LLMs for Historical Dataset Construction from Archival Image Scans: German Patents (1877–1918)

Niclas Griesshaber & Jochen Streb
📄 AI vs Perfect Transcriptions:
Visual Comparison
1. Character Error Rate 2. Patent Entry Extraction based on
Archival Image Scans
3. Variable Extraction based on extracted
Patent Entries
Output
Full Dataset
📁