A neighborhood-based approach for clustering of linked document collections

Authors:
Ralitsa Angelova;Stefan Siersdorfer
Affiliations:
Max Planck Institute for Informatics, Saarbrücken, Germany;Max Planck Institute for Informatics, Saarbrücken, Germany
Venue:
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Year:
2006

Citing 2
Cited 7

Markov random field modeling in image analysis

Markov random field modeling in image analysis
Graph-based text classification: learn from your neighbors

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

A comparative evaluation of different link types on enhancing document clustering

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Web page classification: Features and algorithms

ACM Computing Surveys (CSUR)
Exploit the tripartite network of social tagging for web clustering

Proceedings of the 18th ACM conference on Information and knowledge management
Document clustering of scientific texts using citation contexts

Information Retrieval
Costco: robust content and structure constrained clustering of networked documents

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Leveraging network structure for incremental document clustering

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Improving Vietnamese web page clustering by combining neighbors' content and using iterative feature selection

Proceedings of the Third Symposium on Information and Communication Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the problem of automatically structuring linked document collections by using clustering. In contrast to traditional clustering, we study the clustering problem in the light of available link structure information for the data set (e.g., hyperlinks among web documents or co-authorship among bibliographic data entries). Our approach is based on iterative relaxation of cluster assignments, and can be built on top of any clustering algorithm. This technique results in higher cluster purity, better overall accuracy, and make self-organization more robust.