Document Clustering Using Incremental and Pairwise Approaches

  • Authors:
  • Tien Tran;Richi Nayak;Peter Bruza

  • Affiliations:
  • Information Technology, Queensland University of Technology, Brisbane, Australia;Information Technology, Queensland University of Technology, Brisbane, Australia;Information Technology, Queensland University of Technology, Brisbane, Australia

  • Venue:
  • Focused Access to XML Documents
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents the experiments and results of a clustering approach for clustering of the large Wikipedia dataset in the INEX 2007 Document Mining Challenge. The clustering approach employed makes use of an incremental clustering method and a pairwise clustering method. The approach enables us to perform the clustering task on a large dataset by first reducing the dimension of the dataset to an undefined number of clusters using the incremental method. The lower-dimension dataset is then clustered to a required number of clusters using the pairwise method. In this way, clustering of the large number of documents is performed successfully and the accuracy of the clustering solution is achieved.