Temporal latent semantic analysis for collaboratively generated content: preliminary results

Authors:
Yu Wang;Eugene Agichtein
Affiliations:
Emory University, Atlanta, GA, USA;Emory University, Atlanta, GA, USA
Venue:
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Year:
2011

Citing 2
Cited 0

Using the past to score the present: extending term weighting models through revision history analysis

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A word at a time: computing word relatedness using temporal semantic analysis

Proceedings of the 20th international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Latent semantic analysis (LSA) has been intensively studied because of its wide application to Information Retrieval and Natural Language Processing. Yet, traditional models such as LSA only examine one (current) version of the document. However, due to the recent proliferation of collaboratively generated content such as threads in online forums, Collaborative Question Answering archives, Wikipedia, and other versioned content, the document generation process is now directly observable. In this study, we explore how this additional temporal information about the document evolution could be used to enhance the identification of latent document topics. Specifically, we propose a novel hidden-topic modeling algorithm, temporal Latent Semantic Analysis (tLSA), which elegantly extends LSA to modeling document revision history using tensor decomposition. Our experiments show that tLSA outperforms LSA on word relatedness estimation using benchmark data, and explore applications of tLSA for other tasks.