Learning quantifiable associations via principal sparse non-negative matrix factorization

  • Authors:
  • Chenyong Hu;Benyu Zhang;Yongji Wang;Shuicheng Yan;Zheng Chen;Qing Wang;Qiang Yang

  • Affiliations:
  • Lab for Internet Software Technologies, Institute of Software Chinese Academy of Sciences, Beijing, 100080, P.R. China. E-mail: {huchenyong,ywang,wq}@itechs.iscas.ac.cn;Microsoft Research Asia, 49 Zhichun Road, Beijing, 100080, P.R. China. E-mail: {byzhang,zhengc}@microsoft.com;Lab for Internet Softw. Technol., Inst. of Softw. Chin. Acad. of Sci., Beijing, 100080, P.R. China. E-mail: {huchenyong,ywang,wq}@itechs.iscas.ac.cn and Key Lab of Comp. Sci., Inst. of Softw. Chin ...;Department of Information Engineering, the Chinese University of Hong Kong, Hong Kong. E-mail: scyan@ie.cuhk.edu.hk;Microsoft Research Asia, 49 Zhichun Road, Beijing, 100080, P.R. China. E-mail: {byzhang,zhengc}@microsoft.com;Lab for Internet Software Technologies, Institute of Software Chinese Academy of Sciences, Beijing, 100080, P.R. China. E-mail: {huchenyong,ywang,wq}@itechs.iscas.ac.cn;Department of Computer Science, Hong Kong University of Science and Technology. E-mail: qyang@cs.ust.hk

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Association rules are traditionally designed to capture statistical relationship among itemsets in a given database. To additionally capture the quantitative association knowledge, Korn et.al. recently propose a paradigm named Ratio Rules [6] for quantifiable data mining. However, their approach is mainly based on Principle Component Analysis (PCA), and as a result, it cannot guarantee that the ratio coefficients are non-negative. This may lead to serious problems in the rules' application. In this paper, we propose a new method, called Principal Sparse Non-negative Matrix Factorization (PSNMF), for learning the associations between itemsets in the form of Ratio Rules. In addition, we provide a support measurement to weigh the importance of each rule for the entire dataset. Experiments on several datasets illustrate that the proposed method performs well for discovering latent associations between itemsets in large datasets.