Paper Title
Author Disambiguation Based on Hierarchical Agglomerative Clustering in Heterogeneous Scholarly Data
Abstract
In many fields, different types of scholarly data are utilized to provide information for users. However, it is time
consuming and cumbersome to extract information from data presented in different formats or to differentiate between data
provided by different authors having the same name. To solve this issue, we identify author entities in different academic data
(e.g., papers, patents, and reports), and offer users refined data by connecting author entities that exist in different types of
data. Entity identification aims to match authors having the same name with actual people; it reduces the time and effort
required to search for academic information, and provides accurate information. The matching involves merging existing
information from different formats into a single format. In this paper, to identify author entities, we extract bibliographic
information related to authors of academic papers and journals as well as similarities between authors using the hierarchical
agglomerative clustering method. To validate the proposed method, we identify entities using the authors data obtained from
papers, patents, and reports published in Korea between 1948 and 2016. Based on the results obtained using this data, our
system exhibited a precision of 91.29%.
Index Terms - Author Disambiguation, Co-Author Network, Hierarchical Agglomerative Clustering