PROBABILISTIC HEURISTICS FOR HIERARCHICAL WEB DATA CLUSTERING

  • Authors:
  • Morteza Haghir Chehreghani;Mostafa Haghir Chehreghani;Hassan Abolhassani

  • Affiliations:
  • Department of Computer Engineering, Sharif University of Technology, Tehran, Iran;Department of Computer Engineering, Sharif University of Technology, Tehran, Iran;Department of Computer Engineering, Sharif University of Technology, Tehran, Iran

  • Venue:
  • Computational Intelligence
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering Web data is one important technique for extracting knowledge from the Web. In this paper, a novel method is presented to facilitate the clustering. The method determines the appropriate number of clusters and provides suitable representatives for each cluster by inference from a Bayesian network. Furthermore, by means of the Bayesian network, the contents of the Web pages are converted into vectors of lower dimensions. The method is also extended for hierarchical clustering, and a useful heuristic is developed to select a good hierarchy. The experimental results show that the clusters produced benefit from high quality. (The value of this threshold is a subjective issue that depends on the human perceptions of relevancy, precision, and recall. It can be easily determined by some limited human-oriented examinations. © 2012 Wiley Periodicals, Inc.)