Semantic integration in text: from ambiguous names to identifiable entities

Authors:
Xin Li;Paul Morie;Dan Roth
Affiliations:
-;-;-
Venue:
AI Magazine - Special issue on semantic integration
Year:
2005

Citing 19
Cited 26

Algorithms for clustering data

Algorithms for clustering data
The merge/purge problem for large databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Semantic interoperability in global information systems

ACM SIGMOD Record
Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Learning domain-independent string transformation weights for high accuracy object identification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to match and cluster large high-dimensional data sets for data integration

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A machine learning approach to coreference resolution of noun phrases

Computational Linguistics - Special issue on computational anaphora resolution
Entity-based cross-document coreferencing using the Vector Space Model

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
iMAP: discovering complex semantic matches between database schemas

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Information-theoretic tools for mining database structure from large data sets

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Performance issues and error analysis in an open-domain Question Answering system

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Improving machine learning approaches to coreference resolution

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Unsupervised personal name disambiguation

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Profile-Based Object Matching for Information Integration

IEEE Intelligent Systems
Identification and tracing of ambiguous names: discriminative and generative approaches

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Relational learning via propositional algorithms: an information extraction case study

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Relational clustering for multi-type entity resolution

MRDM '05 Proceedings of the 4th international workshop on Multi-relational mining
Link mining: a survey

ACM SIGKDD Explorations Newsletter
POLYPHONET: an advanced social network extraction system from the web

Proceedings of the 15th international conference on World Wide Web
Also by the same author: AKTiveAuthor, a citation graph approach to name disambiguation

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Query-time entity resolution

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficiently linking text documents with relevant structured information

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Entity resolution in geospatial data integration

GIS '06 Proceedings of the 14th annual ACM international symposium on Advances in geographic information systems
Collective entity resolution in relational data

ACM Transactions on Knowledge Discovery from Data (TKDD)
POLYPHONET: An advanced social network extraction system from the Web

Web Semantics: Science, Services and Agents on the World Wide Web
Helping satisfy multiple objectives during a service desk conversation

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Towards breaking the quality curse.: a web-querying approach to web people search.

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Structured entity identification and document categorization: two tasks with one joint model

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Do we mean the same?: disambiguation of extracted keyword queries for database search

Proceedings of the First International Workshop on Keyword Search on Structured Data
Disambiguating Personal Names on the Web using Automatically Extracted Key Phrases

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Spinning multiple social networks for semantic web

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Online collective entity resolution

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Query-time entity resolution

Journal of Artificial Intelligence Research
Extracting key phrases to disambiguate personal name queries in web search

CLIIR '06 Proceedings of the Workshop on How Can Computational Linguistics Improve Information Retrieval?
Adaptive string similarity metrics for biomedical reference resolution

ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Entity Resolution in Texts Using Statistical Learning and Ontologies

ASWC '09 Proceedings of the 4th Asian Conference on The Semantic Web
Entity-aware query processing for heterogeneous data with uncertainty and correlations

Proceedings of the 2009 EDBT/ICDT Workshops
A heuristic approach to author name disambiguation in bibliometrics databases for large-scale research assessments

Journal of the American Society for Information Science and Technology
A sequence labeling method using syntactical and textual patterns for record linkage

ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
Exploiting Web querying for Web people search

ACM Transactions on Database Systems (TODS)
Extracting key phrases to disambiguate personal names on the web

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
AUTOMATIC ANNOTATION OF AMBIGUOUS PERSONAL NAMES ON THE WEB

Computational Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semantic integration focuses on discovering, representing, and manipulating correspondences between entities in disparate data sources. The topic has been widely studied in the context of structured data, with problems being considered including ontology and schema matching, matching relational tuples, and reconciling inconsistent data values. In recent years, however, semantic integration over text has also received increasing attention. This article studies a key challenge in semantic integration over text: identifying whether different mentions of real-world entities, such as "JFK" and "John Kennedy," within and across natural language text documents, actually represent the same concept.We present a machine-learning study of this problem. The first approach is a discriminative approach--a pairwise local classifier is trained in a supervised way to determine whether two given mentions represent the same real-world entity. This is followed, potentially, by a global clustering algorithm that uses the classifier as its similarity metric. Our second approach is a global generative model, at the heart of which is a view on how documents are generated and how names (of different entity types) are "sprinkled" into them. In its most general form, our model assumes (1) a joint distribution over entities (for example, a document that mentions "President Kennedy" is more likely to mention "Oswald" or "White House" than "Roger Clemens"), and (2) an "author" model that assumes that at least one mention of an entity in a document is easily identifiable and then generates other mentions via (3) an "appearance" model that governs how mentions are transformed from the "representative" mention. We show that both approaches perform very accurately, in the range of 90-95 percent. F1 measure for different entity types, much better than previous approaches to some aspects of this problem. Finally, we discuss how our solution for mention matching in text can be potentially applied to matching relational tuples, as well as to linking entities across databases and text.