An Approach to Web-Scale Named-Entity Disambiguation

Authors:
Luís Sarmento;Alexander Kehlenbeck;Eugénio Oliveira;Lyle Ungar
Affiliations:
Faculdade de Engenharia da Universidade do Porto - DEI - LIACC, Porto, Portugal 4200-465;Google Inc, USA;Faculdade de Engenharia da Universidade do Porto - DEI - LIACC, Porto, Portugal 4200-465;University of Pennsylvania - CS, Philadelphia, USA
Venue:
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2009

Citing 13
Cited 3

Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Document clustering with committees

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
On Clustering Validation Techniques

Journal of Intelligent Information Systems
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation

WWW '03 Proceedings of the 12th international conference on World Wide Web
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Homonymy and polysemy in information retrieval

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
One sense per discourse

HLT '91 Proceedings of the workshop on Speech and Natural Language
Unsupervised personal name disambiguation

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Collective entity resolution in relational data

ACM Transactions on Knowledge Discovery from Data (TKDD)
Comparing clusterings---an information based distance

Journal of Multivariate Analysis
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Web-scale named entity recognition

Proceedings of the 17th ACM conference on Information and knowledge management
Efficient Clustering of Web-Derived Data Sets

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition

IdentityRank: Named entity disambiguation in the news domain

Expert Systems with Applications: An International Journal
Targeted disambiguation of ad-hoc, homogeneous sets of named entities

Proceedings of the 21st international conference on World Wide Web
Named entity disambiguation in streaming data

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a multi-pass clustering approach to large scale, wide-scope named-entity disambiguation (NED) on collections of web pages. Our approach uses name co-occurrence information to cluster and hence disambiguate entities, and is designed to handle NED on the entire web. We show that on web collections, NED becomes increasingly difficult as the corpus size increases, not only because of the challenge of scaling the NED algorithm, but also because new and surprising facets of entities become visible in the data. This effect limits the potential benefits for data-driven approaches of processing larger data-sets, and suggests that efficient clustering-based disambiguation methods for the web will require extracting more specialized information from documents.