Incorporating word correlation into tag-topic model for semantic knowledge acquisition

Authors:
Fang Li;Tingting He;Xinhui Tu;Xiaohua Hu
Affiliations:
Central China Normal University, Wuhan, China;Central China Normal University, Wuhan, China;Central China Normal University, Wuhan, China;Central China Normal University, Wuhan, China & Drexel University, Philadelphia, PA, USA
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 13
Cited 0

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Acquisition of categorized named entities for web search

Proceedings of the thirteenth ACM international conference on Information and knowledge management
LDA-based document models for ad-hoc retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Latent dirichlet allocation based multi-document summarization

Proceedings of the second workshop on Analytics for noisy unstructured text data
Constrained Clustering: Advances in Algorithms, Theory, and Applications

Constrained Clustering: Advances in Algorithms, Theory, and Applications
Incorporating domain knowledge into topic modeling via Dirichlet Forest priors

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
WikiRelate! computing semantic relatedness using wikipedia

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Deriving a large scale taxonomy from Wikipedia

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Latent Dirichlet Allocation with topic-in-set knowledge

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Wikipedia-based semantic smoothing for the language modeling approach to information retrieval

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Incorporating lexical priors into topic models

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a tag-topic model with Dirichlet Forest prior (TTM-DF) for semantic knowledge acquisition from blog. The TTM-DF model extends the tag-topic model (TTM) by replacing the Dirichlet prior with the Dirichlet Forest prior over the topic-word multinomial. The correlation between words are calculated to generate a set of Must-Links and Cannot-Links, then the structures of Dirichlet trees are obtained though encoding the constraints of Must-Links and Cannot-Links. Words under the same subtrees are expected to be more correlated than words under different subtrees. We conduct experiments on a synthetic and a blog dataset. Both of the experimental results show that the TTM-DF model performs much better than the TTM model. It can improve the coherence of the underlying topics and the tag-topic distributions, and capture semantic knowledge effectively.