A unified probabilistic framework for clustering correlated heterogeneous web objects

  • Authors:
  • Guowei Liu;Weibin Zhu;Yong Yu

  • Affiliations:
  • Computer Science & Engineering Department, Shanghai Jiaotong University, Shanghai, China;Computer Science & Engineering Department, Shanghai Jiaotong University, Shanghai, China;Computer Science & Engineering Department, Shanghai Jiaotong University, Shanghai, China

  • Venue:
  • APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most existing algorithms cluster highly correlated data objects (e.g. web pages and web queries) separately. Some other algorithms, however, do take into account the relationship between data objects, but they either integrate content and link features into a unified feature space or apply a hard clustering algorithm, making it difficult to fully utilize the correlated information over the heterogeneous Web objects. In this paper, we propose a novel unified probabilistic framework for iteratively clustering correlated heterogeneous data objects until it converges. Our approach introduces two latent clustering layers, which serve as two mixture probabilistic models of the features. In each clustering iteration we use EM (Expectation-Maximization) algorithm to estimate the parameters of the mixture model in one latent layer and propagate them to the other one. The experimental results show that our approach effectively combines the content and link features and improves the performance of the clustering.