Author Name Disambiguation in Academic Databases

Description

The retrieval of academic documents from scientific databases is negatively affected by the ambiguity of the author’s name. Different approaches of feature extraction/engineering have been proposed which leverage mixes of both textual embeddings and graph representations of the papers to calculate their similarity for the required task of clustering.

Duration

Total project length: 175 hours

Task ideas

Extract/Construct features from the non-structured and semi-structured paper’s metadata in order to get useful latent representations. This task could imply simultaneous use of different network architectures. Cluster the set of papers in different groups (one group by each author entity in the database)

Expected results

Train/Test different proposed AI methodologies to extract/construct paper features and to identify its correct author-entities across the Impactu scientific metadata database.

Requirements

Python
Pytorch/Tensorflow
Basic knowledge of graph networks and LLMs.

Project difficulty level

Intermediate

Mentors

Darío Peña (Universidad Externado de Colombia)
Omar Zapata (Universidad de Antioquia)
Gabriel Vélez (Universidad de Antioquia)

Please DO NOT contact mentors directly by email. Instead, please email human-ai@cern.ch with Project Title and include your CV. The mentors will then get in touch with you.

Corresponding Project

Colav

Participating Organizations

Colav