Random Walks on Regular and Irregular Graphs
SIAM Journal on Discrete Mathematics
Duplicate record elimination in large data files
ACM Transactions on Database Systems (TODS)
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
WAIM '01 Proceedings of the Second International Conference on Advances in Web-Age Information Management
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
Translating unknown cross-lingual queries in digital libraries using a web-based approach
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Two supervised learning approaches for name disambiguation in author citations
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Using Ontology in Hierarchical Information Clustering
HICSS '05 Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05) - Track 4 - Volume 04
Name disambiguation in author citations using a K-way spectral clustering method
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Automatic glossary extraction: beyond terminology identification
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Domain-independent data cleaning via analysis of entity-relationship graph
ACM Transactions on Database Systems (TODS)
Automatic document indexing in large medical collections
HIKM '06 Proceedings of the international workshop on Healthcare information and knowledge management
A Taxonomy Learning Method and Its Application to Characterize a Scientific Web Community
IEEE Transactions on Knowledge and Data Engineering
Semantic taxonomy induction from heterogenous evidence
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Efficient topic-based unsupervised name disambiguation
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Ontology-driven automatic entity disambiguation in unstructured text
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Discovering a term taxonomy from term similarities using principal component analysis
EWMF'05/KDO'05 Proceedings of the 2005 joint international conference on Semantics, Web and Mining
Author name disambiguation for citations on the deep web
WAIM'10 Proceedings of the 2010 international conference on Web-age information management
Efficient name disambiguation in digital libraries
WAIM'11 Proceedings of the 12th international conference on Web-age information management
Anddy: a system for author name disambiguation in digital library
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
A semi-supervised approach for author disambiguation in KDD CUP 2013
Proceedings of the 2013 KDD Cup 2013 Workshop
Hi-index | 0.00 |
Name disambiguation in databases is a non-trivial task because people's names are often not unique and usually only a limited information is associated with each name in the database. For example, in DBLP many authors share the same name, whereas we do not have any unique identifier to distinguish them. To make it worst, we may not always be able to access the full contents of the materials, unless we have joined those organizations (e.g. ACM) who publish them. As such, how to disambiguate different names with a very limited information is a very challenging task. In this paper, we focus ourselves on such situation. We propose a term-based driven clustering approach for solving it. Specifically, we first construct some term-based taxonomies to mimic the expert knowledge of the domain by linking the related terms that appear in there automatically. Each taxonomy is then transformed into a graph, and we group the entries that belong to the same author by using either of the two novel models, namely, graph-based similarity model and graph-based random walk model. The former model aims at computing the similarity among terms, whereas the later model aims at investigating how likely would a set of terms be transformed to another set of terms. Extensive experiments are conducted by using the entries in DBLP. The favorable results indicated that our proposed approach is highly effective.