Entity-based cross-document coreferencing using the Vector Space Model
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Experiments in multi-modal automatic content extraction
HLT '01 Proceedings of the first international conference on Human language technology research
A testbed for people searching strategies in the WWW
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised personal name disambiguation
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
An algorithm for unsupervised topic discovery from broadcast news stories
HLT '02 Proceedings of the second international conference on Human Language Technology Research
The SemEval-2007 WePS evaluation: establishing a benchmark for the web people search task
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Identifying co-referential names across large corpora
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Profile based cross-document coreference using kernelized fuzzy relational clustering
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Empirical studies in learning to read
FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Challenges from information extraction to information fusion
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Streaming cross document entity coreference resolution
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Large-scale cross-document coreference using distributed inference and hierarchical models
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Knowledge base population: successful approaches and challenges
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Coreference for learning to extract relations: yes, Virginia, coreference matters
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Building a cross-language entity linking collection in twenty-one languages
CLEF'11 Proceedings of the Second international conference on Multilingual and multimodal information access evaluation
Mining entity translations from comparable corpora: a holistic graph mapping approach
Proceedings of the 20th ACM international conference on Information and knowledge management
Entity clustering across languages
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Hi-index | 0.00 |
This paper describes a language-independent, scalable system for both challenges of cross-document co-reference: name variation and entity disambiguation. We provide system results from the ACE 2008 evaluation in both English and Arabic. Our English system's accuracy is 8.4% relative better than an exact match baseline (and 14.2% relative better over entities mentioned in more than one document). Unlike previous evaluations, ACE 2008 evaluated both name variation and entity disambiguation over naturally occurring named mentions. An information extraction engine finds document entities in text. We describe how our architecture designed for the 10K document ACE task is scalable to an even larger corpus. Our cross-document approach uses the names of entities to find an initial set of document entities that could refer to the same real world entity and then uses an agglomerative clustering algorithm to disambiguate the potentially co-referent document entities. We analyze how different aspects of our system affect performance using ablation studies over the English evaluation set. In addition to evaluating cross-document co-reference performance, we used the results of the cross-document system to improve the accuracy of within-document extraction, and measured the impact in the ACE 2008 within-document evaluation.