End-to-end handwritten text recognition for early modern spanish documents

Description

Despite its potential for researchers and consumer platforms, end-to-end Handwritten Text Recognition (HDR) remains an underexplored field. Unlike traditional OCR, HDR recognizes entire documents, capturing both text and structural elements such as layout and reading order. Recent models, such as DAN, have demonstrated advancements in whole-document recognition. This project will apply HDR models to early modern Spanish manuscripts, leveraging transformer-based architectures to improve recognition accuracy. By integrating state-of-the-art machine learning with historical document analysis, we aim to enhance the accessibility of handwritten and printed texts from this period while contributing to the broader field of AI-driven humanities research.

Duration

Total project length: 175 hours

Task ideas

Expected results

Requirements

Python and some previous experience in Machine Learning.

Difficulty level

Advanced

Mentors

Please DO NOT contact mentors directly by email. Instead, please email human-ai@cern.ch with Project Title and include your CV and test results. The mentors will then get in touch with you.

Corresponding Project

Participating Organizations