The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
WebBase: a repository of Web pages
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Automated name authority control
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Novelty and redundancy detection in adaptive filtering
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Text joins in an RDBMS for web data integration
WWW '03 Proceedings of the 12th international conference on World Wide Web
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
The Journal of Machine Learning Research
Towards the self-annotating web
Proceedings of the 13th international conference on World Wide Web
Two supervised learning approaches for name disambiguation in author citations
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Iterative record linkage for cleaning and integration
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Disambiguating Web appearances of people in a social network
WWW '05 Proceedings of the 14th international conference on World Wide Web
Comparative study of name disambiguation problem using a scalable blocking-based framework
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Effective and scalable solutions for mixed and split citation problems in digital libraries
Proceedings of the 2nd international workshop on Information quality in information systems
POLYPHONET: an advanced social network extraction system from the web
Proceedings of the 15th international conference on World Wide Web
Adaptive Name Matching in Information Integration
IEEE Intelligent Systems
Search engine driven author disambiguation
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Measuring semantic similarity between words using web search engines
Proceedings of the 16th international conference on World Wide Web
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Constraint-based entity matching
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Improving author coreference by resource-bounded information gathering from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Searching and navigating petabyte-scale file systems based on facets
PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
Towards breaking the quality curse.: a web-querying approach to web people search.
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
From Web 1.0 to Web 2.0 and back -: how did your grandma use to tag?
Proceedings of the 10th ACM workshop on Web information and data management
Entity Resolution and Information Quality
Entity Resolution and Information Quality
Exploiting Web querying for Web people search
ACM Transactions on Database Systems (TODS)
Adaptive Connection Strength Models for Relationship-Based Entity Resolution
Journal of Data and Information Quality (JDIQ) - Special Issue on Entity Resolution
Query-driven approach to entity resolution
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
When a variety of names are used for the same real-world entity, the problem of detecting all such variants has been known as the (record) linkage or entity resolution problem. In this paper, toward this problem, we propose a novel approach that uses the Web as the collective knowledge source in addition to contents of entities. Our hypothesis is that if an entity e1 is a duplicate of another entity e2, and if e1 frequently appears together with information I on the Web, then e2 may appear frequently with I on the Web. By using search engines, we analyze the frequency, URLs, or contents of the returned web pages to capture the information I of an entity. Extensive experiments verify that our hypothesis holds in many real settings, and the idea of using the Web as the additional source for the linkage problem is promising. Our proposal shows 51% (on average) and 193% (at best) improvement in precision/recall compared to a baseline approach.