On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
Neural Networks: A Comprehensive Foundation
Neural Networks: A Comprehensive Foundation
Two supervised learning approaches for name disambiguation in author citations
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Automatic Identification of Home Pages on the Web
HICSS '05 Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05) - Track 4 - Volume 04
Name disambiguation in author citations using a K-way spectral clustering method
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Effective and scalable solutions for mixed and split citation problems in digital libraries
Proceedings of the 2nd international workshop on Information quality in information systems
Proceedings of the 15th international conference on World Wide Web
Domain-independent data cleaning via analysis of entity-relationship graph
ACM Transactions on Database Systems (TODS)
Search engine driven author disambiguation
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Efficient topic-based unsupervised name disambiguation
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Author Name Disambiguation for Citations Using Topic and Web Correlation
ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
Classifier Ensemble Generation for the Majority Vote Rule
CIARP '08 Proceedings of the 13th Iberoamerican congress on Pattern Recognition: Progress in Pattern Recognition, Image Analysis and Applications
On co-authorship for author disambiguation
Information Processing and Management: an International Journal
A Term-Based Driven Clustering Approach for Name Disambiguation
APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Using web information for author name disambiguation
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Graph clustering based on structural/attribute similarities
Proceedings of the VLDB Endowment
Efficient web pages identification for entity resolution
Proceedings of the 19th international conference on World wide web
Sampling dirty data for matching attributes
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Enhance web pages genre identification using neighboring pages
WISE'11 Proceedings of the 12th international conference on Web information system engineering
Efficient name disambiguation for large-scale databases
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Hi-index | 0.00 |
In many databases, science bibliography database for example, name attribute is the most commonly chosen identifier to identify entities. However, names are often ambiguous and not always unique which cause problems in many fields. Name disambiguation is a non-trivial task in data management that aims to properly distinguish different entities which share the same name, particularly for large databases like digital libraries, as only limited information can be used to identify authors' name. In digital libraries, ambiguous author names occur due to the existence of multiple authors with the same name or different name variations for the same person. Also known as name disambiguation, most of the previous works to solve this issue often employ hierarchical clustering approaches based on information inside the citation records, e.g. co-authors and publication titles. In this paper, we focus on proposing a robust hybrid name disambiguation framework that is not only applicable for digital libraries but also can be easily extended to other application based on different data sources. We propose a web pages genre identification component to identify the genre of a web page, e.g. whether the page is a personal homepage. In addition, we propose a re-clustering model based on multidimensional scaling that can further improve the performance of name disambiguation. We evaluated our approach on known corpora, and the favorable experiment results indicated that our proposed framework is feasible.