Maximum margin clustering on evolutionary data

  • Authors:
  • Xuhui Fan;Lin Zhu;Longbing Cao;Xia Cui;Yew-Soon Ong

  • Affiliations:
  • Advanced Analytics Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, Australia;Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China;Advanced Analytics Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, Australia;Advanced Analytics Institute, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, Australia;School of Computer Engineering, Nanyang Technological University, Signapore, Singapore

  • Venue:
  • Proceedings of the 21st ACM international conference on Information and knowledge management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Evolutionary data, such as topic changing blogs and evolving trading behaviors in capital market, is widely seen in business and social applications. The time factor and intrinsic change embedded in evolutionary data greatly challenge evolutionary clustering. To incorporate the time factor, existing methods mainly regard the evolutionary clustering problem as a linear combination of snapshot cost and temporal cost, and reflect the time factor through the temporal cost. It still faces accuracy and scalability challenge though promising results gotten. This paper proposes a novel evolutionary clustering approach, evolutionary maximum margin clustering (e-MMC), to cluster large-scale evolutionary data from the maximum margin perspective. e-MMC incorporates two frameworks: Data Integration from the data changing perspective and Model Integration corresponding to model adjustment to tackle the time factor and change, with an adaptive label allocation mechanism. Three e-MMC clustering algorithms are proposed based on the two frameworks. Extensive experiments are performed on synthetic data, UCI data and real-world blog data, which confirm that e-MMC outperforms the state-of-the-art clustering algorithms in terms of accuracy, computational cost and scalability. It shows that e-MMC is particularly suitable for clustering large-scale evolving data.