Exploiting Forum Thread Structures to Improve Thread Clustering

Authors:
Kumaresh Pattabiraman;Parikshit Sondhi;ChengXiang Zhai
Affiliations:
University of Illinois, Urbana-Champaign, Dept. of Computer Science;University of Illinois, Urbana-Champaign, Dept. of Computer Science;University of Illinois, Urbana-Champaign, Dept. of Computer Science
Venue:
Proceedings of the 2013 Conference on the Theory of Information Retrieval
Year:
2013

Citing 15
Cited 0

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A formal study of information retrieval heuristics

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A cross-collection mixture model for comparative text mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Simple BM25 extension to multiple weighted fields

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Formal models for expert finding in enterprise corpora

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
CrossClus: user-guided multi-relational clustering

Data Mining and Knowledge Discovery
Retrieval and feedback models for blog feed search

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Finding question-answer pairs from online forums

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
It pays to be picky: an evaluation of thread retrieval in online forums

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Online community search using thread structure

Proceedings of the 18th ACM conference on Information and knowledge management
Federated Search

Foundations and Trends in Information Retrieval
Exploiting thread structures to improve smoothing of language models for forum post retrieval

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
The optimum clustering framework: implementing the cluster hypothesis

Information Retrieval
Retrieving similar discussion forum threads: a structure based approach

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automated clustering of threads within and across web forums will greatly benefit both users and forum administrators in efficiently seeking, managing, and integrating the huge volume of content being generated. While clustering has been studied for other types of data, little work has been done on clustering forum threads; the informal nature and special structure of forum data make it interesting to study how to effectively cluster forum threads. In this paper, we apply three state of the art clustering methods (i.e., hierarchical agglomerative clustering, k-Means, and probabilistic latent semantic analysis) to cluster forum threads and study how to leverage the structure of threads to improve clustering accuracy. We propose three different methods for assigning weights to the posts in a forum thread to achieve more accurate representation of a thread. We evaluate all the methods on data collected from three different Linux forums for both within-forum and across-forum clustering. Our results show that the state of the art methods perform reasonably well for this task, but the performance can be further improved by exploiting thread structures. In particular, a parabolic weighting method that assigns higher weights for both beginning posts and end posts of a thread is shown to consistently outperform a standard clustering method.