Identifying co-referential names across large corpora

Authors:
Levon Lloyd;Andrew Mehler;Steven Skiena
Affiliations:
Department of Computer Science, State University of New York at Stony Brook, Stony Brook, NY;Department of Computer Science, State University of New York at Stony Brook, Stony Brook, NY;Department of Computer Science, State University of New York at Stony Brook, Stony Brook, NY
Venue:
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Year:
2006

Citing 7
Cited 7

The merge/purge problem for large databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Learning to match and cluster large high-dimensional data sets for data integration

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Entity-based cross-document coreferencing using the Vector Space Model

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Improving machine learning approaches to coreference resolution

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Unsupervised personal name disambiguation

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Lydia: a system for large-scale news analysis

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval

Expanding network communities from representative examples

ACM Transactions on Knowledge Discovery from Data (TKDD)
Name-ethnicity classification from open sources

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Who is who and what is what: experiments in cross-document co-reference

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Identifying Differences in News Coverage between Cultural/Ethnic Groups

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Concordance-based entity-oriented search

Web Intelligence and Agent Systems
Arabic cross-document coreference detection

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Lexicon-based Comments-oriented News Sentiment Analyzer system

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

A single logical entity can be referred to by several different names over a large text corpus. We present our algorithm for finding all such co-reference sets in a large corpus. Our algorithm involves three steps: morphological similarity detection, contextual similarity analysis, and clustering. Finally, we present experimental results on over large corpus of real news text to analyze the performance our techniques.