Causal gene identification using combinatorial V-structure search

  • Authors:
  • Ruichu Cai;Zhenjie Zhang;Zhifeng Hao

  • Affiliations:
  • Faculty of Computer Science, Guangdong University of Technology, Guangzhou, PR China and State Key Laboratory for Novel Software Technology, Nanjing University, PR China;Advanced Digital Sciences Center, Illinois at Singapore Pte. Ltd., Singapore;Faculty of Computer Science, Guangdong University of Technology, Guangzhou, PR China

  • Venue:
  • Neural Networks
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the advances of biomedical techniques in the last decade, the costs of human genomic sequencing and genomic activity monitoring are coming down rapidly. To support the huge genome-based business in the near future, researchers are eager to find killer applications based on human genome information. Causal gene identification is one of the most promising applications, which may help the potential patients to estimate the risk of certain genetic diseases and locate the target gene for further genetic therapy. Unfortunately, existing pattern recognition techniques, such as Bayesian networks, cannot be directly applied to find the accurate causal relationship between genes and diseases. This is mainly due to the insufficient number of samples and the extremely high dimensionality of the gene space. In this paper, we present the first practical solution to causal gene identification, utilizing a new combinatorial formulation over V-Structures commonly used in conventional Bayesian networks, by exploring the combinations of significant V-Structures. We prove the NP-hardness of the combinatorial search problem under a general settings on the significance measure on the V-Structures, and present a greedy algorithm to find sub-optimal results. Extensive experiments show that our proposal is both scalable and effective, particularly with interesting findings on the causal genes over real human genome data.