Collective context-aware topic models for entity disambiguation

Authors:
Prithviraj Sen
Affiliations:
IBM Corporation, San Jose, CA, USA
Venue:
Proceedings of the 21st international conference on World Wide Web
Year:
2012

Citing 15
Cited 4

SemTag and seeker: bootstrapping the semantic web via automated semantic annotation

WWW '03 Proceedings of the 12th international conference on World Wide Web
Latent dirichlet allocation

The Journal of Machine Learning Research
The author-topic model for authors and documents

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Making Logistic Regression a Core Data Mining Tool with TR-IRLS

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Topic modeling: beyond bag-of-words

ICML '06 Proceedings of the 23rd international conference on Machine learning
Wikify!: linking documents to encyclopedic knowledge

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Fast collapsed gibbs sampling for latent dirichlet allocation

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Learning to link with wikipedia

Proceedings of the 17th ACM conference on Information and knowledge management
A Latent Topic Model for Complete Entity Resolution

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Collective annotation of Wikipedia entities in web text

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Operations for learning with graphical models

Journal of Artificial Intelligence Research
Distributed Algorithms for Topic Models

The Journal of Machine Learning Research
On smoothing and inference for topic models

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
An architecture for parallel topic models

Proceedings of the VLDB Endowment

An entity-topic model for entity linking

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Mining evidences for named entity disambiguation

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A hybrid approach for spotting, disambiguating and annotating places in user-generated text

Proceedings of the 22nd international conference on World Wide Web companion
Entity disambiguation in anonymized graphs using graph kernels

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

A crucial step in adding structure to unstructured data is to identify references to entities and disambiguate them. Such disambiguated references can help enhance readability and draw similarities across different pieces of running text in an automated fashion. Previous research has tackled this problem by first forming a catalog of entities from a knowledge base, such as Wikipedia, and then using this catalog to disambiguate references in unseen text. However, most of the previously proposed models either do not use all text in the knowledge base, potentially missing out on discriminative features, or do not exploit word-entity proximity to learn high-quality catalogs. In this work, we propose topic models that keep track of the context of every word in the knowledge base; so that words appearing within the same context as an entity are more likely to be associated with that entity. Thus, our topic models utilize all text present in the knowledge base and help learn high-quality catalogs. Our models also learn groups of co-occurring entities thus enabling collective disambiguation. Unlike most previous topic models, our models are non-parametric and do not require the user to specify the exact number of groups present in the knowledge base. In experiments performed on an extract of Wikipedia containing almost 60,000 references, our models outperform SVM-based baselines by as much as 18% in terms of disambiguation accuracy translating to an increment of almost 11,000 correctly disambiguated references.