LDA-Based topic modeling in labeling blog posts with wikipedia entries

Authors:
Daisuke Yokomoto;Kensaku Makita;Hiroko Suzuki;Daichi Koike;Takehito Utsuro;Yasuhide Kawada;Tomohiro Fukuhara
Affiliations:
University of Tsukuba, Tsukuba, Japan;University of Tsukuba, Tsukuba, Japan;University of Tsukuba, Tsukuba, Japan;University of Tsukuba, Tsukuba, Japan;University of Tsukuba, Tsukuba, Japan;Navix Co., Ltd., Tokyo, Japan;National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
Venue:
APWeb'12 Proceedings of the 14th international conference on Web Technologies and Applications
Year:
2012

Citing 9
Cited 1

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Cluster-based retrieval using language models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
LDA-based document models for ad-hoc retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Enhancing text clustering by leveraging Wikipedia semantics

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Enhancing cluster labeling using wikipedia

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Web Information Organization Using Keyword Distillation Based Clustering

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Facetedpedia: dynamic generation of query-dependent faceted interfaces for wikipedia

Proceedings of the 19th international conference on World wide web
Faceted Search

Faceted Search

SNS-based issue detection and related news summarization scheme

Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a search query, most existing search engines simply return a ranked list of search results. However, it is often the case that those search result documents consist of a mixture of documents that are closely related to various contents. In order to address the issue of quickly overviewing the distribution of contents, this paper proposes a framework of labeling blog posts with Wikipedia entries through LDA (latent Dirichlet allocation) based topic modeling. More specifically, this paper applies an LDA-based document model to the task of labelling blog posts with Wikipedia entries. One of the most important advantages of this LDA-based document model is that the collected Wikipedia entries and their LDA parameters heavily depend on the distribution of keywords across all the search result of blog posts. This tendency actually contributes to quickly overviewing the search result of blog posts through the LDA-based topic distribution. In the evaluation of the paper, we also show that the LDA-based document retrieval scheme outperforms our previous approach.