Blog feed search with a post index

Authors:
Wouter Weerkamp;Krisztian Balog;Maarten Rijke
Affiliations:
ISLA, University of Amsterdam, Amsterdam, The Netherlands;Department of Computer and Information Science, NTNU, Trondheim, Norway;ISLA, University of Amsterdam, Amsterdam, The Netherlands
Venue:
Information Retrieval
Year:
2011

Citing 16
Cited 3

A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A hidden Markov model information retrieval system

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Title language model for information retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Analysis of anchor text for web search

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
Formal models for expert finding in enterprise corpora

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval and feedback models for blog feed search

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Bloggers as experts: feed distillation using expert retrieval models

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval

Introduction to Information Retrieval
Key blog distillation: ranking aggregates

Proceedings of the 17th ACM conference on Information and knowledge management
Blog site search using resource selection

Proceedings of the 17th ACM conference on Information and knowledge management
A language modeling framework for expert finding

Information Processing and Management: an International Journal
Finding Key Bloggers, One Post At A Time

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
An effective coherence measure to determine topical consistency in user-generated content

International Journal on Document Analysis and Recognition - Special Issue NOISY
A generative blog post retrieval model that uses query expansion based on external collections

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
A study of blog search

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval

Credibility-inspired ranking for blog post retrieval

Information Retrieval
Expertise Retrieval

Foundations and Trends in Information Retrieval
Diversity in blog feed retrieval

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

User generated content forms an important domain for mining knowledge. In this paper, we address the task of blog feed search: to find blogs that are principally devoted to a given topic, as opposed to blogs that merely happen to mention the topic in passing. The large number of blogs makes the blogosphere a challenging domain, both in terms of effectiveness and of storage and retrieval efficiency. We examine the effectiveness of an approach to blog feed search that is based on individual posts as indexing units (instead of full blogs). Working in the setting of a probabilistic language modeling approach to information retrieval, we model the blog feed search task by aggregating over a blogger's posts to collect evidence of relevance to the topic and persistence of interest in the topic. This approach achieves state-of-the-art performance in terms of effectiveness. We then introduce a two-stage model where a pre-selection of candidate blogs is followed by a ranking step. The model integrates aggressive pruning techniques as well as very lean representations of the contents of blog posts, resulting in substantial gains in efficiency while maintaining effectiveness at a very competitive level.