Despite its potential for researchers and consumer platforms, end-to-end Handwritten Text Recognition (HDR) remains an underexplored field. Unlike traditional OCR, HDR recognizes entire documents, capturing both text and structural elements such as layout and reading order. Recent models, such as DAN, have demonstrated advancements in whole-document recognition. This project will apply HDR models to early modern Spanish manuscripts, leveraging transformer-based architectures to improve recognition accuracy. By integrating state-of-the-art machine learning with historical document analysis, we aim to enhance the accessibility of handwritten and printed texts from this period while contributing to the broader field of AI-driven humanities research.
Total project length: 175 hours
Python and some previous experience in Machine Learning.
Advanced
Please DO NOT contact mentors directly by email. Instead, please email human-ai@cern.ch with Project Title and include your CV and test results. The mentors will then get in touch with you.