Pagerank based clustering of hypertext document collections

Authors:
Konstantin Avrachenkov;Vladimir Dobrynin;Danil Nemirovsky;Son Kim Pham;Elena Smirnova
Affiliations:
INRIA Sophia Antipolis, Sophia Antipolis, France;St. Petersburg State University, St. Petersburg, Russian Fed.;INRIA, Sophia Antipolis, France;UCSD, San Diego, CA, USA;St. Petersburg State University, St. Petersburg, Russian Fed.
Venue:
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2008

Citing 4
Cited 3

Inferring Web communities from link topology

Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
Topic-sensitive PageRank

Proceedings of the 11th international conference on World Wide Web
Local partitioning for directed graphs using PageRank

WAW'07 Proceedings of the 5th international conference on Algorithms and models for the web-graph
Web communities identification from random walks

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases

Quick detection of top-k personalized pagerank lists

WAW'11 Proceedings of the 8th international conference on Algorithms and models for the web graph
Efficient personalized pagerank with accuracy assurance

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Annotation propagation in image databases using similarity graphs

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering hypertext document collection is an important task in Information Retrieval. Most clustering methods are based on document content and do not take into account the hyper-text links. Here we propose a novel PageRank based clustering (PRC) algorithm which uses the hypertext structure. The PRC algorithm produces graph partitioning with high modularity and coverage. The comparison of the PRC algorithm with two content based clustering algorithms shows that there is a good match between PRC clustering and content based clustering.