The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Learning to match and cluster large high-dimensional data sets for data integration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Entity-based cross-document coreferencing using the Vector Space Model
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Improving machine learning approaches to coreference resolution
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Unsupervised personal name disambiguation
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Lydia: a system for large-scale news analysis
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Expanding network communities from representative examples
ACM Transactions on Knowledge Discovery from Data (TKDD)
Name-ethnicity classification from open sources
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Who is who and what is what: experiments in cross-document co-reference
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Identifying Differences in News Coverage between Cultural/Ethnic Groups
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Concordance-based entity-oriented search
Web Intelligence and Agent Systems
Arabic cross-document coreference detection
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Lexicon-based Comments-oriented News Sentiment Analyzer system
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
A single logical entity can be referred to by several different names over a large text corpus. We present our algorithm for finding all such co-reference sets in a large corpus. Our algorithm involves three steps: morphological similarity detection, contextual similarity analysis, and clustering. Finally, we present experimental results on over large corpus of real news text to analyze the performance our techniques.