Improved ROCK for text clustering using asymmetric proximity

  • Authors:
  • Shaoxu Song;Chunping Li

  • Affiliations:
  • School of Software, Tsinghua University, Beijing, China;School of Software, Tsinghua University, Beijing, China

  • Venue:
  • SOFSEM'06 Proceedings of the 32nd conference on Current Trends in Theory and Practice of Computer Science
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The ROCK algorithm can be applied to text clustering in large databases. The effectiveness of ROCK, however, is limited, because of the high dimensionality of textual data and traditional proximity measure of documents. In this paper, we propose an improved approach to strengthen the discriminative feature of text documents, which uses asymmetric proximity. Instead of the links count in ROCK, we propose a novel concept of link weight overlaps to measure the proximity between two clusters. The IROCK (Improved ROCK) algorithm performs clustering analysis based on the overlap information of asymmetric proximities between text objects. We carry on the clustering process in an agglomerative hierarchical way. To demonstrate the effectiveness of IROCK, we perform an experimental evaluation on real textual data. A comparison with ROCK and classical algorithms indicates the superiority of our approach.