Diversity in blog feed retrieval

  • Authors:
  • Mostafa Keikha;Fabio Crestani;W. Bruce Croft

  • Affiliations:
  • University of Lugano, Lugano, Switzerland;University of Lugano, Lugano, Switzerland;Center for Intelligent Information Retrieval, University of Massachusetts, Amherst, MA, USA

  • Venue:
  • Proceedings of the 21st ACM international conference on Information and knowledge management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Blog distillation (blog feed retrieval) is a task in blog retrieval where the goal is to rank blogs according to their recurrent relevance to a query topic. One of the main properties of blog feed retrieval is that the unit of retrieval is a collection of documents as opposed to a single document as in other IR tasks. This collection retrieval nature of blog distillation introduces new challenges and requires new investigations specific to this problem. Researchers have addressed this problem by considering a wide range of evidence and information resources. However, previous work has not studied the effect of on-topic diversity of blog posts in blog relevance. By on-topic diversity of blog posts we mean that those posts that are about the query topic need to have high diversity and cover different sub-topics of the query. In this study, we investigate three types of on-topic diversity and their effect on retrieval performance: topical diversity, temporal diversity and hybrid diversity. Our experiments over different blog collections and different baseline methods show that on-topic diversity can improve the performance of the retrieval system. Among the three types of diversity, hybrid diversity, that considers both topical and temporal diversities, achieves the best performance.