The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Revolutionizing name authority control
DL '00 Proceedings of the fifth ACM conference on Digital libraries
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic segmentation of text into structured records
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Automated name authority control
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Text joins in an RDBMS for web data integration
WWW '03 Proceedings of the 12th international conference on World Wide Web
Methods for precise named entity matching in digital collections
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Identifying and Merging Related Bibliographic Records
Identifying and Merging Related Bibliographic Records
Two supervised learning approaches for name disambiguation in author citations
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Iterative record linkage for cleaning and integration
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Comparative study of name disambiguation problem using a scalable blocking-based framework
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Adaptive Name Matching in Information Integration
IEEE Intelligent Systems
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Search engine driven author disambiguation
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Efficient topic-based unsupervised name disambiguation
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Proceedings of the 9th annual ACM international workshop on Web information and data management
Keeping a digital library clean: new solutions to old problems
Proceedings of the eighth ACM symposium on Document engineering
MyCites: An Intelligent Information System for Maintaining Citations
SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
Author Name Disambiguation for Citations Using Topic and Web Correlation
ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
On co-authorship for author disambiguation
Information Processing and Management: an International Journal
Using web information for author name disambiguation
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
On Graph-Based Name Disambiguation
Journal of Data and Information Quality (JDIQ)
Proceedings of the 22nd Conference of the Computer-Human Interaction Special Interest Group of Australia on Computer-Human Interaction
Construction of a large-scale test set for author disambiguation
Information Processing and Management: an International Journal
An effective web document clustering algorithm based on bisection and merge
Artificial Intelligence Review
Eliminating the redundancy in blocking-based entity resolution methods
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Efficient name disambiguation in digital libraries
WAIM'11 Proceedings of the 12th international conference on Web-age information management
Applied Intelligence
Author Name Disambiguation in Citations
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Efficient name disambiguation for large-scale databases
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Disambiguating authors in citations on the web and authorship correlations
Expert Systems with Applications: An International Journal
A tool for generating synthetic authorship records for evaluating author name disambiguation methods
Information Sciences: an International Journal
Hi-index | 0.00 |
In this paper, we consider two important problems that commonly occur in bibliographic digital libraries, which seriously degrade their data qualities: Mixed Citation (MC) problem (i.e., citations of different scholars with their names being homonyms are mixed together) and Split Citation (SC) problem (i.e., citations of the same author appear under different name variants). In particular, we investigate an effective yet scalable solution since citations in such digital libraries tend to be large-scale. After formally defining the problems and accompanying challenges, we present an effective solution that is based on the state-of-the-art sampling-based approximate join algorithm. Our claim is verified through preliminary experimental results.