Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Document clustering with committees
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
On Clustering Validation Techniques
Journal of Intelligent Information Systems
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation
WWW '03 Proceedings of the 12th international conference on World Wide Web
Clustering Data Streams: Theory and Practice
IEEE Transactions on Knowledge and Data Engineering
Homonymy and polysemy in information retrieval
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
HLT '91 Proceedings of the workshop on Speech and Natural Language
Unsupervised personal name disambiguation
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Collective entity resolution in relational data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Comparing clusterings---an information based distance
Journal of Multivariate Analysis
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Web-scale named entity recognition
Proceedings of the 17th ACM conference on Information and knowledge management
Efficient Clustering of Web-Derived Data Sets
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
IdentityRank: Named entity disambiguation in the news domain
Expert Systems with Applications: An International Journal
Targeted disambiguation of ad-hoc, homogeneous sets of named entities
Proceedings of the 21st international conference on World Wide Web
Named entity disambiguation in streaming data
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Hi-index | 0.00 |
We present a multi-pass clustering approach to large scale, wide-scope named-entity disambiguation (NED) on collections of web pages. Our approach uses name co-occurrence information to cluster and hence disambiguate entities, and is designed to handle NED on the entire web. We show that on web collections, NED becomes increasingly difficult as the corpus size increases, not only because of the challenge of scaling the NED algorithm, but also because new and surprising facets of entities become visible in the data. This effect limits the potential benefits for data-driven approaches of processing larger data-sets, and suggests that efficient clustering-based disambiguation methods for the web will require extracting more specialized information from documents.