A clustering scheme for large high-dimensional document datasets

  • Authors:
  • Jung-Yi Jiang;Jing-Wen Chen;Shie-Jue Lee

  • Affiliations:
  • Dept. of Electrical Engineering, National Sun Yat-Sen University, Taiwan;Dept. of Electrical Engineering, National Sun Yat-Sen University, Taiwan;Dept. of Electrical Engineering, National Sun Yat-Sen University, Taiwan

  • Venue:
  • ISICA'07 Proceedings of the 2nd international conference on Advances in computation and intelligence
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Scalability and high dimensionality are two common problems associated with document clustering. We present a novel scheme to deal with these problems. Given a set of documents, we partition the set into several parts.We use one part and cluster the constituent documents into groups. By the obtained groups, we reduce the number of features by a certain ratio. Then we add another part, cluster the documents into groups based on the reduced features, and further reduce the number of the remaining features. This process is iterated until all parts are used. Experimental results have shown that our proposed scheme is effective for clustering large high-dimensional document datasets.