Employing document dependency in blog search

  • Authors:
  • Mostafa Keikha;Fabio Crestani;Mark James Carman

  • Affiliations:
  • Faculty of Informatics, University of Lugano, Switzerland;Faculty of Informatics, University of Lugano, Switzerland;Faculty of IT, Monash University, Australia

  • Venue:
  • Journal of the American Society for Information Science and Technology
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The goal in blog search is to rank blogs according to their recurrent relevance to the topic of the query. State-of-the-art approaches view it as an expert search or resource selection problem. We investigate the effect of content-based similarity between posts on the performance of the retrieval system. We test two different approaches for smoothing (regularizing) relevance scores of posts based on their dependencies. In the first approach, we smooth term distributions describing posts by performing a random walk over a document-term graph in which similar posts are highly connected. In the second, we directly smooth scores for posts using a regularization framework that aims to minimize the discrepancy between scores for similar documents. We then extend these approaches to consider the time interval between the posts in smoothing the scores. The idea is that if two posts are temporally close, then they are good sources for smoothing each other's relevance scores. We compare these methods with the state-of-the-art approaches in blog search that employ Language Modeling-based resource selection algorithms and fusion-based methods for aggregating post relevance scores. We show performance gains over the baseline techniques which do not take advantage of the relation between posts for smoothing relevance estimates. © 2012 Wiley Periodicals, Inc.