Who is who and what is what: experiments in cross-document co-reference

Authors:
Alex Baron;Marjorie Freedman
Affiliations:
BBN Technologies, Cambridge, MA;BBN Technologies, Cambridge, MA
Venue:
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2008

Citing 7
Cited 11

Entity-based cross-document coreferencing using the Vector Space Model

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Experiments in multi-modal automatic content extraction

HLT '01 Proceedings of the first international conference on Human language technology research
A testbed for people searching strategies in the WWW

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised personal name disambiguation

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
An algorithm for unsupervised topic discovery from broadcast news stories

HLT '02 Proceedings of the second international conference on Human Language Technology Research
The SemEval-2007 WePS evaluation: establishing a benchmark for the web people search task

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Identifying co-referential names across large corpora

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching

Profile based cross-document coreference using kernelized fuzzy relational clustering

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Empirical studies in learning to read

FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Enhancing cross document coreference of web documents with context similarity and very large scale text categorization

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Challenges from information extraction to information fusion

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Streaming cross document entity coreference resolution

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Large-scale cross-document coreference using distributed inference and hierarchical models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Knowledge base population: successful approaches and challenges

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Coreference for learning to extract relations: yes, Virginia, coreference matters

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Building a cross-language entity linking collection in twenty-one languages

CLEF'11 Proceedings of the Second international conference on Multilingual and multimodal information access evaluation
Mining entity translations from comparable corpora: a holistic graph mapping approach

Proceedings of the 20th ACM international conference on Information and knowledge management
Entity clustering across languages

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a language-independent, scalable system for both challenges of cross-document co-reference: name variation and entity disambiguation. We provide system results from the ACE 2008 evaluation in both English and Arabic. Our English system's accuracy is 8.4% relative better than an exact match baseline (and 14.2% relative better over entities mentioned in more than one document). Unlike previous evaluations, ACE 2008 evaluated both name variation and entity disambiguation over naturally occurring named mentions. An information extraction engine finds document entities in text. We describe how our architecture designed for the 10K document ACE task is scalable to an even larger corpus. Our cross-document approach uses the names of entities to find an initial set of document entities that could refer to the same real world entity and then uses an agglomerative clustering algorithm to disambiguate the potentially co-referent document entities. We analyze how different aspects of our system affect performance using ablation studies over the English evaluation set. In addition to evaluating cross-document co-reference performance, we used the results of the cross-document system to improve the accuracy of within-document extraction, and measured the impact in the ACE 2008 within-document evaluation.