Building implicit links from content for forum search

Authors:
Gu Xu;Wei-Ying Ma
Affiliations:
Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China
Venue:
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2006

Citing 20
Cited 16

Algorithms for clustering data

Algorithms for clustering data
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Web page scoring systems for horizontal and vertical search

Proceedings of the 11th international conference on World Wide Web
Topic-sensitive PageRank

Proceedings of the 11th international conference on World Wide Web
Hierarchically Classifying Documents Using Very Few Words

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Web Structure, Dynamics and Page Quality

SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Focused Crawls, Tunneling, and Digital Libraries

ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Model-Based Hierarchical Clustering

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Enhanced word clustering for hierarchical text classification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Scaling personalized web search

WWW '03 Proceedings of the 12th international conference on World Wide Web
Topic hierarchy generation via linear discriminant projection

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Learning effective ranking functions for newsgroup search

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
PageRank as a function of the damping factor

WWW '05 Proceedings of the 14th international conference on World Wide Web
Exploiting the hierarchical structure for link analysis

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Dirichlet PageRank

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Extracting and ranking viral communities using seeds and content similarity

Proceedings of the nineteenth ACM conference on Hypertext and hypermedia
Quantify music artist similarity based on style and mood

Proceedings of the 10th ACM workshop on Web information and data management
Simultaneously modeling semantics and structure of threaded discussions: a sparse coding approach and its applications

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
IPHITS: An Incremental Latent Topic Model for Link Structure

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Organizing news archives by near-duplicate copy detection in digital libraries

ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
postingRank: bringing order to web forum postings

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Ranking of evolving stories through meta-aggregation

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
An empirical study on learning to rank of tweets

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Extracting local web communities using lexical similarity

DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Detecting near-duplicate relations in user generated forum content

OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems
Exploiting thread structures to improve smoothing of language models for forum post retrieval

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Learning online discussion structures by conditional random fields

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Complete-Thread extraction from web forums

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Retrieving similar discussion forum threads: a structure based approach

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Hierarchical co-clustering based on entropy splitting

Proceedings of the 21st ACM international conference on Information and knowledge management
One size does not fit all: multi-granularity search of web forums

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The objective of Web forums is to create a shared space for open communications and discussions of specific topics and issues. The tremendous information behind forum sites is not fully-utilized yet. Most links between forum pages are automatically created, which means the link-based ranking algorithm cannot be applied efficiently. In this paper, we proposed a novel ranking algorithm which tries to introduce the content information into link-based methods as implicit links. The basic idea is derived from the more focused random surfer: the surfer may more likely jump to a page which is similar to what he is reading currently. In this manner, we are allowed to introduce the content similarities into the link graph as a personalization bias. Our method, named Fine-grained Rank (FGRank), can be efficiently computed based on an automatically generated topic hierarchy. Not like the topic-sensitive PageRank, our method only need to compute single PageRank score for each page. Another contribution of this paper is to present a very efficient algorithm for automatically generating topic hierarchy and map each page in a large-scale collection onto the computed hierarchy. The experimental results show that the proposed method can improve retrieval performance, and reveal that content-based link graph is also important compared with the hyper-link graph.