RenAIssance encompasses the use of optical character recognition (OCR) to digitize text sources that have not yet been targeted by existing tools. Its purpose is to explore machine learning techniques to enable OCR on a variety of materials that have never been digitized before. Additionally, RenAIssance aims to develop a comprehensive end-to-end tool for text recognition, streamlining the entire pipeline from image preprocessing to text extraction. It also seeks to establish a standardized benchmarking framework by collecting a common validation dataset, allowing for accurate ranking and evaluation of all previously developed models.
End-to-end handwritten text recognition for early modern spanish documents
Synthetic renaissance text generation with generative models