Automating Text Recognition and Transliteration of Historical Documents with convolutional - recurrent architectures

Description

Transliteration of text from centuries-old works represents a research area that is underserved by current tools, such as Adobe Acrobat’s OCR. While these resources can perform text recognition from clearly printed modern sources, they are incapable of extracting textual data from early forms of print, much less manuscripts. This project will focus on the application of hybrid end-to-end models based on convolutional - recurrent architectures (CNN-RNN) to recognize text in Spanish printed sources from the seventeenth century.

Duration

Total project length: 175 hours

Task ideas

Expected results

Requirements

Python and some previous experience in Machine Learning.

Difficulty level

Medium

Test

Please use this link to access the test for this project.

Mentors

Please DO NOT contact mentors directly by email. Instead, please email human-ai@cern.ch with Project Title and include your CV and test results. The mentors will then get in touch with you.

Corresponding Project

Participating Organizations