A Term-Based Driven Clustering Approach for Name Disambiguation

Authors:
Jia Zhu;Xiaofang Zhou;Gabriel Pui Fung
Affiliations:
School of ITEE, The University of Queensland, Australia;School of ITEE, The University of Queensland, Australia;School of ITEE, The University of Queensland, Australia
Venue:
APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Year:
2009

Citing 18
Cited 6

Random Walks on Regular and Irregular Graphs

SIAM Journal on Discrete Mathematics
Duplicate record elimination in large data files

ACM Transactions on Database Systems (TODS)
Efficient clustering of high-dimensional data sets with application to reference matching

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Comparing String Similarity Measures for Reducing Inconsistency in Integrating Data from Different Sources

WAIM '01 Proceedings of the Second International Conference on Advances in Web-Age Information Management
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Translating unknown cross-lingual queries in digital libraries using a web-based approach

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Two supervised learning approaches for name disambiguation in author citations

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Using Ontology in Hierarchical Information Clustering

HICSS '05 Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05) - Track 4 - Volume 04
Name disambiguation in author citations using a K-way spectral clustering method

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Automatic glossary extraction: beyond terminology identification

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Domain-independent data cleaning via analysis of entity-relationship graph

ACM Transactions on Database Systems (TODS)
Automatic document indexing in large medical collections

HIKM '06 Proceedings of the international workshop on Healthcare information and knowledge management
A Taxonomy Learning Method and Its Application to Characterize a Scientific Web Community

IEEE Transactions on Knowledge and Data Engineering
Semantic taxonomy induction from heterogenous evidence

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Efficient topic-based unsupervised name disambiguation

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Ontology-driven automatic entity disambiguation in unstructured text

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Discovering a term taxonomy from term similarities using principal component analysis

EWMF'05/KDO'05 Proceedings of the 2005 joint international conference on Semantics, Web and Mining

Author name disambiguation for citations on the deep web

WAIM'10 Proceedings of the 2010 international conference on Web-age information management
Efficient name disambiguation in digital libraries

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Anddy: a system for author name disambiguation in digital library

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
Author disambiguation using multi-aspect similarity indicators

Scientometrics
A semi-supervised approach for author disambiguation in KDD CUP 2013

Proceedings of the 2013 KDD Cup 2013 Workshop
Robust hybrid name disambiguation framework for large databases

Scientometrics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Name disambiguation in databases is a non-trivial task because people's names are often not unique and usually only a limited information is associated with each name in the database. For example, in DBLP many authors share the same name, whereas we do not have any unique identifier to distinguish them. To make it worst, we may not always be able to access the full contents of the materials, unless we have joined those organizations (e.g. ACM) who publish them. As such, how to disambiguate different names with a very limited information is a very challenging task. In this paper, we focus ourselves on such situation. We propose a term-based driven clustering approach for solving it. Specifically, we first construct some term-based taxonomies to mimic the expert knowledge of the domain by linking the related terms that appear in there automatically. Each taxonomy is then transformed into a graph, and we group the entries that belong to the same author by using either of the two novel models, namely, graph-based similarity model and graph-based random walk model. The former model aims at computing the similarity among terms, whereas the later model aims at investigating how likely would a set of terms be transformed to another set of terms. Extensive experiments are conducted by using the entries in DBLP. The favorable results indicated that our proposed approach is highly effective.