Using Bayesian networks to analyze expression data
RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Feature selection for high-dimensional genomic microarray data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Efficient Feature Selection via Analysis of Relevance and Redundancy
The Journal of Machine Learning Research
Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm
The Journal of Machine Learning Research
Markov blanket-embedded genetic algorithm for gene selection
Pattern Recognition
Improving the Reliability of Causal Discovery from Small Data Sets Using Argumentation
The Journal of Machine Learning Research
Causality: Models, Reasoning and Inference
Causality: Models, Reasoning and Inference
The Journal of Machine Learning Research
The Journal of Machine Learning Research
BASSUM: A Bayesian semi-supervised method for classification feature selection
Pattern Recognition
What is Unequal among the Equals? Ranking Equivalent Rules from Gene Expression Data
IEEE Transactions on Knowledge and Data Engineering
Hi-index | 0.00 |
With the advances of biomedical techniques in the last decade, the costs of human genomic sequencing and genomic activity monitoring are coming down rapidly. To support the huge genome-based business in the near future, researchers are eager to find killer applications based on human genome information. Causal gene identification is one of the most promising applications, which may help the potential patients to estimate the risk of certain genetic diseases and locate the target gene for further genetic therapy. Unfortunately, existing pattern recognition techniques, such as Bayesian networks, cannot be directly applied to find the accurate causal relationship between genes and diseases. This is mainly due to the insufficient number of samples and the extremely high dimensionality of the gene space. In this paper, we present the first practical solution to causal gene identification, utilizing a new combinatorial formulation over V-Structures commonly used in conventional Bayesian networks, by exploring the combinations of significant V-Structures. We prove the NP-hardness of the combinatorial search problem under a general settings on the significance measure on the V-Structures, and present a greedy algorithm to find sub-optimal results. Extensive experiments show that our proposal is both scalable and effective, particularly with interesting findings on the causal genes over real human genome data.