A significance-based graph model for clustering web documents

Authors:
Argyris Kalogeratos;Aristidis Likas
Affiliations:
Department of Computer Science, University of Ioannina, Ioannina, Greece;Department of Computer Science, University of Ioannina, Ioannina, Greece
Venue:
SETN'06 Proceedings of the 4th Helenic conference on Advances in Artificial Intelligence
Year:
2006

Citing 0
Cited 1

Document clustering using synthetic cluster prototypes

Data & Knowledge Engineering

Quantified Score

Hi-index	0.01

Visualization

Abstract

Traditional document clustering techniques rely on single-term analysis, such as the widely used Vector Space Model. However, recent approaches have emerged that are based on Graph Models and provide a more detailed description of document properties. In this work we present a novel Significance-based Graph Model for Web documents that introduces a sophisticated graph weighting method, based on significance evaluation of graph elements. We also define an associated similarity measure based on the maximum common subgraph between the graphs of the corresponding web documents. Experimental results on artificial and real document collections using well-known clustering algorithms indicate the effectiveness of the proposed approach.