Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A formal study of information retrieval heuristics
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A cross-collection mixture model for comparative text mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Simple BM25 extension to multiple weighted fields
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Formal models for expert finding in enterprise corpora
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
CrossClus: user-guided multi-relational clustering
Data Mining and Knowledge Discovery
Retrieval and feedback models for blog feed search
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Finding question-answer pairs from online forums
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
It pays to be picky: an evaluation of thread retrieval in online forums
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Online community search using thread structure
Proceedings of the 18th ACM conference on Information and knowledge management
Foundations and Trends in Information Retrieval
Exploiting thread structures to improve smoothing of language models for forum post retrieval
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
The optimum clustering framework: implementing the cluster hypothesis
Information Retrieval
Retrieving similar discussion forum threads: a structure based approach
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
Automated clustering of threads within and across web forums will greatly benefit both users and forum administrators in efficiently seeking, managing, and integrating the huge volume of content being generated. While clustering has been studied for other types of data, little work has been done on clustering forum threads; the informal nature and special structure of forum data make it interesting to study how to effectively cluster forum threads. In this paper, we apply three state of the art clustering methods (i.e., hierarchical agglomerative clustering, k-Means, and probabilistic latent semantic analysis) to cluster forum threads and study how to leverage the structure of threads to improve clustering accuracy. We propose three different methods for assigning weights to the posts in a forum thread to achieve more accurate representation of a thread. We evaluate all the methods on data collected from three different Linux forums for both within-forum and across-forum clustering. Our results show that the state of the art methods perform reasonably well for this task, but the performance can be further improved by exploiting thread structures. In particular, a parabolic weighting method that assigns higher weights for both beginning posts and end posts of a thread is shown to consistently outperform a standard clustering method.