Author Name Disambiguation in Academic Databases


The retrieval of academic documents from scientific databases is negatively affected by the ambiguity of the author’s name. Different approaches of feature extraction/engineering have been proposed which leverage mixes of both textual embeddings and graph representations of the papers to calculate their similarity for the required task of clustering.


Total project length: 175 hours

Task ideas

Extract/Construct features from the non-structured and semi-structured paper’s metadata in order to get useful latent representations. This task could imply simultaneous use of different network architectures. Cluster the set of papers in different groups (one group by each author entity in the database)

Expected results

Train/Test different proposed AI methodologies to extract/construct paper features and to identify its correct author-entities across the Impactu scientific metadata database.


Project difficulty level



Please DO NOT contact mentors directly by email. Instead, please email with Project Title and include your CV. The mentors will then get in touch with you.

Corresponding Project

Participating Organizations