Text recognition with transformer models and LLM or Vision-Language Model integration

Description

Transcription of text from centuries-old works represents a research area that is underserved by current tools, such as Adobe Acrobat’s OCR. While these resources can perform text recognition from clearly printed modern sources, they are incapable of extracting textual data from early forms of print, much less manuscripts. This project will focus on the application of hybrid end-to-end models based on transformers (e.g. VIT-RNN or CNN-TF or VIT-TF) and integration of either LLM models or VLM models (or both) to recognize text in Spanish printed sources from the seventeenth century. This project aims to expand the dataset from previous iterations, to help the model finetune handling of print and handwritten documents. The project will also integrate LLM models such as Gemini3 to increase the accuracy of the transcription, with VLMs also being a possible path for contributors, as a late-stage step of the process. The goal is to increase our fine-tuning and transcription accuracy on larger datasets incorporating diverse typographical styles, both printed and handwritten.

Duration

Total project length: 175 hours

Task ideas

Creation of a hybrid end-to-end model based on transformers (e.g. VIT-RNN or CNN-TF or VIT-TF) capable of performing text recognition.
Implement Language Modeling & Contextual Understanding for post-processing, allowing for contextual corrections based on 17th-century grammar to further enhancing the OCR accuracy.
Integrate LLM or VLM models as an integral step for transcription accuracy
Develop and deploy a web or mobile-based annotation tool for historians, researchers, and institutions to validate and refine OCR outputs.

Expected results

Machine learning models will be trained to perform text recognition of non-standard printed text
AI should be able to extract text with at least 90% accuracy

Requirements

Python and some previous experience in Machine Learning.

Difficulty level

Advanced

Tests

Please use this link to access the test for this project.

Mentors

Sergei Gleyzer (University of Alabama)
Xabier Granja (University of Alabama)
Nicholas Jones (Yale University)
Harrison Meadows (University of Tennessee Knoxville)
Emanuele Usai (University of Alabama)

Please DO NOT contact mentors directly by email. Instead, please email human-ai@cern.ch with Project Title and include your CV and test results. The mentors will then get in touch with you.

Corresponding Project

RenAIssance