Web Document Clustering Technique Using Case Grammar Structure

  • Authors:
  • K. P. Supreethi;E. V. Prasad

  • Affiliations:
  • -;-

  • Venue:
  • ICCIMA '07 Proceedings of the International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007) - Volume 02
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most of the documents clustering techniques rely on single term analysis of the document data set, such as the Vector space model. More informative features including phrases and their weights are particularly important to achieve more accurate document clustering. Document clustering is particularly useful in many applications such as automatic categorization of documents, grouping search engine results, building taxonomy of documents and others. The motivation behind the work in this paper is that we believe that document clustering should be based not only on single word analysis, but on phrases as well. Phrase based analysis means that the similarity between documents should be based on matching phrases rather than on single words only. In this paper, we propose a system for Web clustering based on two key concepts. The first is the use of weighted phrases as an essential constituent of documents. Similarity between documents will be based on matching phrases and their weights. The second concept is the incremental clustering of documents to maximize the tightness of clusters by carefully watching the similarity distribution inside each cluster.