A global approach to record clustering and file reorganization
Proc. of the third joint BCS and ACM symposium on Research and development in information retrieval
A dynamic cluster maintenance system for information retrieval
SIGIR '87 Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
Incremental clustering and dynamic information retrieval
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Information Retrieval
The Journal of Machine Learning Research
A machine learning approach to coreference resolution of noun phrases
Computational Linguistics - Special issue on computational anaphora resolution
Entity-based cross-document coreferencing using the Vector Space Model
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Improving machine learning approaches to coreference resolution
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Unsupervised personal name disambiguation
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
A mention-synchronous coreference resolution algorithm based on the Bell tree
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Unrestricted Coreference: Identifying Entities and Events in OntoNotes
ICSC '07 Proceedings of the International Conference on Semantic Computing
Web people search: results of the first evaluation and the plan for the second
Proceedings of the 17th international conference on World Wide Web
Efficient methods for topic model inference on streaming document collections
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
BestCut: a graph algorithm for coreference resolution
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Who is who and what is what: experiments in cross-document co-reference
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Large-scale cross-document coreference using distributed inference and hierarchical models
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Entity clustering across languages
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
A discriminative hierarchical model for fast coreference at large scale
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Fast and accurate incremental entity resolution relative to an entity knowledge base
Proceedings of the 21st ACM international conference on Information and knowledge management
Online unsupervised coreference resolution for semi-structured heterogeneous data
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part II
A joint model for discovering and linking entities
Proceedings of the 2013 workshop on Automated knowledge base construction
Hi-index | 0.00 |
Previous research in cross-document entity coreference has generally been restricted to the offline scenario where the set of documents is provided in advance. As a consequence, the dominant approach is based on greedy agglomerative clustering techniques that utilize pairwise vector comparisons and thus require O(n2) space and time. In this paper we explore identifying coreferent entity mentions across documents in high-volume streaming text, including methods for utilizing orthographic and contextual information. We test our methods using several corpora to quantitatively measure both the efficacy and scalability of our streaming approach. We show that our approach scales to at least an order of magnitude larger data than previous reported methods.