Paper Title
Author Disambiguation Based on Hierarchical Agglomerative Clustering in Heterogeneous Scholarly Data

In many fields, different types of scholarly data are utilized to provide information for users. However, it is time consuming and cumbersome to extract information from data presented in different formats or to differentiate between data provided by different authors having the same name. To solve this issue, we identify author entities in different academic data (e.g., papers, patents, and reports), and offer users refined data by connecting author entities that exist in different types of data. Entity identification aims to match authors having the same name with actual people; it reduces the time and effort required to search for academic information, and provides accurate information. The matching involves merging existing information from different formats into a single format. In this paper, to identify author entities, we extract bibliographic information related to authors of academic papers and journals as well as similarities between authors using the hierarchical agglomerative clustering method. To validate the proposed method, we identify entities using the authors data obtained from papers, patents, and reports published in Korea between 1948 and 2016. Based on the results obtained using this data, our system exhibited a precision of 91.29%. Index Terms - Author Disambiguation, Co-Author Network, Hierarchical Agglomerative Clustering