Clustering web documents based on knowledge granularity

  • Authors:
  • Faliang Huang;Shichao Zhang

  • Affiliations:
  • Faculty of Software, Fujian Normal University, Fuzhou, China;Department of Computer Science, Guangxi Normal University, Guilin, China

  • Venue:
  • APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a new data model for Web document representation based on granulation computing, named as Expanded Vector Space Model (EVSM). Traditional Web document clustering is based on two-level knowledge granularity: document and term. It can lead to that clustering results are of “false relevant”. In our approach, Web documents are represented in many-level knowledge granularity. Knowledge granularity with sufficiently conceptual sentences is beneficial for knowledge engineers to understand valuable relations hidden in data. With granularity calculation data can be more efficiently and effectively disposed of and knowledge engineers can handle the same dataset in different knowledge levels. This provides more reliable soundness for interpreting results of various data analysis methods. We experimentally evaluate the proposed approach and demonstrate that our algorithm is promising and efficient.