Co-clustering with augmented data matrix

  • Authors:
  • Meng-Lun Wu;Chia-Hui Chang;Rui-Zhe Liu

  • Affiliations:
  • Dept. of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan;Dept. of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan;Dept. of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan

  • Venue:
  • DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering plays an important role in data mining as many applications use it as a preprocessing step for data analysis. Traditional clustering focuses on the grouping of similar objects, while two-way coclustering can group dyadic data (objects as well as their attributes) simultaneously. Most co-clustering research focuses on single correlation data, but there might be other possible descriptions of dyadic data that could improve co-clustering performance. In this research, we extend ITCC (Information Theoretic Co-Clustering) to the problem of coclustering with augmented matrix. We proposed CCAM (Co-Clustering with Augmented Data Matrix) to include this augmented data for better co-clustering. We apply CCAM in the analysis of on-line advertising, where both ads and users must be clustered. The key data that connect ads and users are the user-ad link matrix, which identifies the ads that each user has linked; both ads and users also have their feature data, i.e. the augmented data matrix. To evaluate the proposed method, we use two measures: classification accuracy and K-L divergence. The experiment is done using the advertisements and user data from Morgenstern, a financial social website that focuses on the advertisement agency. The experiment results show that CCAM provides better performance than ITCC since it consider the use of augmented data during clustering.