Recent trends in hierarchic document clustering: a critical review
Information Processing and Management: an International Journal
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
An interface for navigating clustered document sets returned by queries
COCS '93 Proceedings of the conference on Organizational computing systems
Relevance based language models
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to information retrieval
ACM Transactions on Information Systems (TOIS)
Formal models for expert finding in enterprise corpora
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Finding Key Bloggers, One Post At A Time
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Using coherence-based measures to predict query difficulty
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Topic structure for information retrieval
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
International Journal of Interactive Communication Systems and Technologies
Hi-index | 0.00 |
Topical noise in blogs arises when bloggers digress from the central topical thrust of their blogs. We introduce a method to explicitly incorporate a model of topical noise into a language modeling approach to the task of blog distillation. Topical noise is integrated into the model using a coherence score, which reflects the tightness of the topical structure of a blog. Tests performed on the TRECBlog06 corpus show that a naive integration of the coherence score as blog prior fails to achieve performance improvements. Instead, we develop a set of more sophisticated models in which the coherence score is weighted by a function of the blog retrieval score. The proposed models help improve effectiveness of our language modeling approach to the blog distillation task.