A Distributed Hierarchical Clustering System for Web Mining

  • Authors:
  • Catherine W. Wen;Huan Liu;Wilson X. Wen;Jeffery Zheng

  • Affiliations:
  • -;-;-;-

  • Venue:
  • WAIM '01 Proceedings of the Second International Conference on Advances in Web-Age Information Management
  • Year:
  • 2001

Quantified Score

Hi-index 0.05

Visualization

Abstract

This paper proposes a novel method of distributed hierarchical clustering for Web mining. The method is closely related to our early work of Self-Generated Neural Networks (SGNN), which is in turn based on both self-organizing neural network and concept formation. The complexity of the algorithm is at most O(MNlogN). With the distributed implementation the method can be easily scaled up. The method is independent of the order the web documents presented. The method produces a natural conceptual hierarchy but not a binary tree. The method can include multimedia information into the same cluster hierarchy. A visualization mechanism has been developed for the clustering method and it shows the cluster hierarchy generated by the method has very high quality. The clustering process is fully automatic, and no human intervention is required. A clustering system has been built based on the proposed method, which can be used to automatically generate multimedia search engines, web directories, decision-making assistance systems, knowledge management systems, and personalized knowledge portals.