Algorithms for clustering data
Algorithms for clustering data
The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Semantic interoperability in global information systems
ACM SIGMOD Record
Reconciling schemas of disparate data sources: a machine-learning approach
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Generic Schema Matching with Cupid
Proceedings of the 27th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Learning domain-independent string transformation weights for high accuracy object identification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to match and cluster large high-dimensional data sets for data integration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A machine learning approach to coreference resolution of noun phrases
Computational Linguistics - Special issue on computational anaphora resolution
Entity-based cross-document coreferencing using the Vector Space Model
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
iMAP: discovering complex semantic matches between database schemas
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Information-theoretic tools for mining database structure from large data sets
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Performance issues and error analysis in an open-domain Question Answering system
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Improving machine learning approaches to coreference resolution
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Unsupervised personal name disambiguation
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Profile-Based Object Matching for Information Integration
IEEE Intelligent Systems
Identification and tracing of ambiguous names: discriminative and generative approaches
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Relational learning via propositional algorithms: an information extraction case study
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Relational clustering for multi-type entity resolution
MRDM '05 Proceedings of the 4th international workshop on Multi-relational mining
ACM SIGKDD Explorations Newsletter
POLYPHONET: an advanced social network extraction system from the web
Proceedings of the 15th international conference on World Wide Web
Also by the same author: AKTiveAuthor, a citation graph approach to name disambiguation
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficiently linking text documents with relevant structured information
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Entity resolution in geospatial data integration
GIS '06 Proceedings of the 14th annual ACM international symposium on Advances in geographic information systems
Collective entity resolution in relational data
ACM Transactions on Knowledge Discovery from Data (TKDD)
POLYPHONET: An advanced social network extraction system from the Web
Web Semantics: Science, Services and Agents on the World Wide Web
Helping satisfy multiple objectives during a service desk conversation
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Towards breaking the quality curse.: a web-querying approach to web people search.
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Structured entity identification and document categorization: two tasks with one joint model
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Do we mean the same?: disambiguation of extracted keyword queries for database search
Proceedings of the First International Workshop on Keyword Search on Structured Data
Disambiguating Personal Names on the Web using Automatically Extracted Key Phrases
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Spinning multiple social networks for semantic web
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Online collective entity resolution
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Journal of Artificial Intelligence Research
Extracting key phrases to disambiguate personal name queries in web search
CLIIR '06 Proceedings of the Workshop on How Can Computational Linguistics Improve Information Retrieval?
Adaptive string similarity metrics for biomedical reference resolution
ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Entity Resolution in Texts Using Statistical Learning and Ontologies
ASWC '09 Proceedings of the 4th Asian Conference on The Semantic Web
Entity-aware query processing for heterogeneous data with uncertainty and correlations
Proceedings of the 2009 EDBT/ICDT Workshops
Journal of the American Society for Information Science and Technology
A sequence labeling method using syntactical and textual patterns for record linkage
ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
Exploiting Web querying for Web people search
ACM Transactions on Database Systems (TODS)
Extracting key phrases to disambiguate personal names on the web
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
AUTOMATIC ANNOTATION OF AMBIGUOUS PERSONAL NAMES ON THE WEB
Computational Intelligence
Hi-index | 0.00 |
Semantic integration focuses on discovering, representing, and manipulating correspondences between entities in disparate data sources. The topic has been widely studied in the context of structured data, with problems being considered including ontology and schema matching, matching relational tuples, and reconciling inconsistent data values. In recent years, however, semantic integration over text has also received increasing attention. This article studies a key challenge in semantic integration over text: identifying whether different mentions of real-world entities, such as "JFK" and "John Kennedy," within and across natural language text documents, actually represent the same concept.We present a machine-learning study of this problem. The first approach is a discriminative approach--a pairwise local classifier is trained in a supervised way to determine whether two given mentions represent the same real-world entity. This is followed, potentially, by a global clustering algorithm that uses the classifier as its similarity metric. Our second approach is a global generative model, at the heart of which is a view on how documents are generated and how names (of different entity types) are "sprinkled" into them. In its most general form, our model assumes (1) a joint distribution over entities (for example, a document that mentions "President Kennedy" is more likely to mention "Oswald" or "White House" than "Roger Clemens"), and (2) an "author" model that assumes that at least one mention of an entity in a document is easily identifiable and then generates other mentions via (3) an "appearance" model that governs how mentions are transformed from the "representative" mention. We show that both approaches perform very accurately, in the range of 90-95 percent. F1 measure for different entity types, much better than previous approaches to some aspects of this problem. Finally, we discuss how our solution for mention matching in text can be potentially applied to matching relational tuples, as well as to linking entities across databases and text.