Retrieving similar discussion forum threads: a structure based approach

Authors:
Amit Singh;Deepak P;Dinesh Raghu
Affiliations:
IBM Research - India, Bangalore, India;IBM Research - India, Bangalore, India;IBM Research - India, Bangalore, India
Venue:
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Year:
2012

Citing 23
Cited 1

Algorithms for clustering data

Algorithms for clustering data
The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Finding related pages in the World Wide Web

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Modern Information Retrieval

Modern Information Retrieval
Advances in Automatic Text Summarization

Advances in Automatic Text Summarization
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Implementation of algorithms for maximum matching on nonbipartite graphs.

Implementation of algorithms for maximum matching on nonbipartite graphs.
Latent dirichlet allocation

The Journal of Machine Learning Research
Finding similar files in large document repositories

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Building implicit links from content for forum search

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient online top-K retrieval with arbitrary similarity measures

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
A Hybrid Approach for XML Similarity

SOFSEM '07 Proceedings of the 33rd conference on Current Trends in Theory and Practice of Computer Science
A simple and fast algorithm for K-medoids clustering

Expert Systems with Applications: An International Journal
It pays to be picky: an evaluation of thread retrieval in online forums

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Trust and nuanced profile similarity in online social networks

ACM Transactions on the Web (TWEB)
Online community search using thread structure

Proceedings of the 18th ACM conference on Information and knowledge management
postingRank: bringing order to web forum postings

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Multi-document summarization via budgeted maximization of submodular functions

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Approximating Maximum Weight Matching in Near-Linear Time

FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
Exploiting thread structures to improve smoothing of language models for forum post retrieval

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Survey: An overview on XML similarity: Background, current trends and future directions

Computer Science Review

Exploiting Forum Thread Structures to Improve Thread Clustering

Proceedings of the 2013 Conference on the Theory of Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Online forums are becoming a popular way of finding useful information on the web. Search over forums for existing discussion threads so far is limited to keyword-based search due to the minimal effort required on part of the users. However, it is often not possible to capture all the relevant context in a complex query using a small number of keywords. Example-based search that retrieves similar discussion threads given one exemplary thread is an alternate approach that can help the user provide richer context and vastly improve forum search results. In this paper, we address the problem of finding similar threads to a given thread. Towards this, we propose a novel methodology to estimate similarity between discussion threads. Our method exploits the thread structure to decompose threads in to set of weighted overlapping components. It then estimates pairwise thread similarities by quantifying how well the information in the threads are mutually contained within each other using lexical similarities between their underlying components. We compare our proposed methods on real datasets against state-of-the-art thread retrieval mechanisms wherein we illustrate that our techniques outperform others by large margins on popular retrieval evaluation measures such as NDCG, MAP, Precision@k and MRR. In particular, consistent improvements of up to 10% are observed on all evaluation measures.