SemTag and seeker: bootstrapping the semantic web via automated semantic annotation
WWW '03 Proceedings of the 12th international conference on World Wide Web
The Journal of Machine Learning Research
The author-topic model for authors and documents
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Making Logistic Regression a Core Data Mining Tool with TR-IRLS
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Topic modeling: beyond bag-of-words
ICML '06 Proceedings of the 23rd international conference on Machine learning
Wikify!: linking documents to encyclopedic knowledge
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Fast collapsed gibbs sampling for latent dirichlet allocation
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Learning to link with wikipedia
Proceedings of the 17th ACM conference on Information and knowledge management
A Latent Topic Model for Complete Entity Resolution
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Collective annotation of Wikipedia entities in web text
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Operations for learning with graphical models
Journal of Artificial Intelligence Research
Distributed Algorithms for Topic Models
The Journal of Machine Learning Research
On smoothing and inference for topic models
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
An architecture for parallel topic models
Proceedings of the VLDB Endowment
An entity-topic model for entity linking
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Mining evidences for named entity disambiguation
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A hybrid approach for spotting, disambiguating and annotating places in user-generated text
Proceedings of the 22nd international conference on World Wide Web companion
Entity disambiguation in anonymized graphs using graph kernels
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
A crucial step in adding structure to unstructured data is to identify references to entities and disambiguate them. Such disambiguated references can help enhance readability and draw similarities across different pieces of running text in an automated fashion. Previous research has tackled this problem by first forming a catalog of entities from a knowledge base, such as Wikipedia, and then using this catalog to disambiguate references in unseen text. However, most of the previously proposed models either do not use all text in the knowledge base, potentially missing out on discriminative features, or do not exploit word-entity proximity to learn high-quality catalogs. In this work, we propose topic models that keep track of the context of every word in the knowledge base; so that words appearing within the same context as an entity are more likely to be associated with that entity. Thus, our topic models utilize all text present in the knowledge base and help learn high-quality catalogs. Our models also learn groups of co-occurring entities thus enabling collective disambiguation. Unlike most previous topic models, our models are non-parametric and do not require the user to specify the exact number of groups present in the knowledge base. In experiments performed on an extract of Wikipedia containing almost 60,000 references, our models outperform SVM-based baselines by as much as 18% in terms of disambiguation accuracy translating to an increment of almost 11,000 correctly disambiguated references.