Regularized Gaussian Mixture Model based discretization for gene expression data association mining

  • Authors:
  • Ruichu Cai;Zhifeng Hao;Wen Wen;Lijuan Wang

  • Affiliations:
  • Faculty of Computer Science, Guangdong University of Technology, Guangzhou, P.R. China and State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, P.R. China;Faculty of Computer Science, Guangdong University of Technology, Guangzhou, P.R. China;Faculty of Computer Science, Guangdong University of Technology, Guangzhou, P.R. China;Faculty of Computer Science, Guangdong University of Technology, Guangzhou, P.R. China

  • Venue:
  • Applied Intelligence
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Association rule has shown its usefulness in the gene expression data based disease diagnosis for its good interpretability. The large number of rules generated from the high dimensional gene expression data is one of the main challenges of its applications. In this work, we reveal that the discretization preprocessing is one of the reasons for the association rule number explosion problem. To alleviate this problem, a Regularized Gaussian Mixture Model (RGMM) is proposed to discretize the continuous gene expression data. RGMM explores both the complexity of the discretization model and the information loss of the discretization procedure, under the Minimal Description Length framework. Extensive experiments show the effectiveness of RGMM on real-life gene expression data sets.