Temporal latent semantic analysis for collaboratively generated content: preliminary results

  • Authors:
  • Yu Wang;Eugene Agichtein

  • Affiliations:
  • Emory University, Atlanta, GA, USA;Emory University, Atlanta, GA, USA

  • Venue:
  • Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Latent semantic analysis (LSA) has been intensively studied because of its wide application to Information Retrieval and Natural Language Processing. Yet, traditional models such as LSA only examine one (current) version of the document. However, due to the recent proliferation of collaboratively generated content such as threads in online forums, Collaborative Question Answering archives, Wikipedia, and other versioned content, the document generation process is now directly observable. In this study, we explore how this additional temporal information about the document evolution could be used to enhance the identification of latent document topics. Specifically, we propose a novel hidden-topic modeling algorithm, temporal Latent Semantic Analysis (tLSA), which elegantly extends LSA to modeling document revision history using tensor decomposition. Our experiments show that tLSA outperforms LSA on word relatedness estimation using benchmark data, and explore applications of tLSA for other tasks.