A Generalized Topic Modeling Approach for Maven Search

  • Authors:
  • Ali Daud;Juanzi Li;Lizhu Zhou;Faqir Muhammad

  • Affiliations:
  • Department of Computer Science & Technology, Tsinghua University, Beijing, China 100084;Department of Computer Science & Technology, Tsinghua University, Beijing, China 100084;Department of Computer Science & Technology, Tsinghua University, Beijing, China 100084;Department of Mathematics & Statistics, Allama Iqbal Open University, Islamabad, Pakistan

  • Venue:
  • APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper addresses the problem of semantics-based maven search in research community, which means identifying a person with some given expertise. Traditional approaches either ignored semantic knowledge or temporal information, resulting in some right mavens that cannot be effectively identified because of non-occurrence of keywords and un-exploitation of time effects. In this paper, we propose a novel semantics and temporal information based maven search (STMS) approach to discover latent topics (semantically related soft clusters of words) between the authors, venues (conferences or journals) and time simultaneously. In the proposed approach, each author in a venue is represented as a probability distribution over topics, and each topic is represented as a probability distribution over words and year of the venue for that topic. Through discovered latent topics we can search mavens by implicitly modeling word-author, author-author and author-venue correlations with continuous time effects. Inference making procedure for topics and authors of new venues is explained. We also show how authors' correlations can be discovered and the bad effect of topics sparseness on the retrieval performance. Experimental results on the corpus downloaded from DBLP show that proposed approach significantly outperformed the baseline approach, due to its ability to produce less sparse topics.