FUSINTER: a method for discretization of continuous attributes
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Multivariate discretization for set mining
Knowledge and Information Systems
IEEE Transactions on Knowledge and Data Engineering
Khiops: A Statistical Discretization Method of Continuous Attributes
Machine Learning
Mining top-K covering rule groups for gene expression data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Toward Unsupervised Correlation Preserving Discretization
IEEE Transactions on Knowledge and Data Engineering
Discretization Using Clustering and Rough Set Theory
ICCTA '07 Proceedings of the International Conference on Computing: Theory and Applications
Unsupervised discretization using kernel density estimation
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Information Sciences: an International Journal
ChiMerge: discretization of numeric attributes
AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
What is Unequal among the Equals? Ranking Equivalent Rules from Gene Expression Data
IEEE Transactions on Knowledge and Data Engineering
Unsupervised discretization using tree-based density estimation
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Entropy expressions and their estimators for multivariate distributions
IEEE Transactions on Information Theory
IEEE Transactions on Fuzzy Systems
Hi-index | 0.00 |
Association rule has shown its usefulness in the gene expression data based disease diagnosis for its good interpretability. The large number of rules generated from the high dimensional gene expression data is one of the main challenges of its applications. In this work, we reveal that the discretization preprocessing is one of the reasons for the association rule number explosion problem. To alleviate this problem, a Regularized Gaussian Mixture Model (RGMM) is proposed to discretize the continuous gene expression data. RGMM explores both the complexity of the discretization model and the information loss of the discretization procedure, under the Minimal Description Length framework. Extensive experiments show the effectiveness of RGMM on real-life gene expression data sets.