Entity-based cross-document coreferencing using the Vector Space Model
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Unsupervised personal name disambiguation
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior
The Journal of Machine Learning Research
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Machine learning for coreference resolution: from local classification to global ranking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Is Hillary Rodham Clinton the president?: disambiguating names across documents
CorefApp '99 Proceedings of the Workshop on Coreference and its Applications
Who is who and what is what: experiments in cross-document co-reference
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Understanding the value of features for coreference resolution
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A general method for reducing the complexity of relational inference and its application to MCMC
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Simple coreference resolution with rich syntactic and semantic features
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Coreference resolution in a modular, entity-centered model
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
On dual decomposition and linear programming relaxations for natural language processing
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Turbo parsers: dependency parsing by approximate variational inference
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Parallel Spectral Clustering in Distributed Systems
IEEE Transactions on Pattern Analysis and Machine Intelligence
Streaming cross document entity coreference resolution
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Structured databases of named entities from Bayesian nonparametrics
EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Entity clustering across languages
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
A discriminative hierarchical model for fast coreference at large scale
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A probabilistic model for canonicalizing named entity mentions
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Monte Carlo MCMC: efficient inference by approximate sampling
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Adding distributional semantics to knowledge base entities through web-scale entity linking
AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Human-machine cooperation with epistemological DBs: supporting user corrections to knowledge bases
AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Monte Carlo MCMC: efficient inference by sampling factors
AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
KORE: keyphrase overlap relatedness for entity disambiguation
Proceedings of the 21st ACM international conference on Information and knowledge management
Knowledge harvesting in the big-data era
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Assessing confidence of knowledge base content with an experimental study in entity resolution
Proceedings of the 2013 workshop on Automated knowledge base construction
Ontology-aware partitioning for knowledge graph identification
Proceedings of the 2013 workshop on Automated knowledge base construction
A joint model for discovering and linking entities
Proceedings of the 2013 workshop on Automated knowledge base construction
Hi-index | 0.00 |
Cross-document coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas: (a) a distributed inference technique that uses parallelism to enable large scale processing, and (b) a hierarchical model of coreference that represents uncertainty over multiple granularities of entities to facilitate more effective approximate inference. To evaluate these ideas, we constructed a labeled corpus of 1.5 million disambiguated mentions in Web pages by selecting link anchors referring to Wikipedia entities. We show that the combination of the hierarchical model with distributed inference quickly obtains high accuracy (with error reduction of 38%) on this large dataset, demonstrating the scalability of our approach.