Informative transcription factor selection using support vector machine-based generalized approximate cross validation criteria

  • Authors:
  • Insuk Sohn;Jooyong Shim;Changha Hwang;Sujong Kim;Jae Won Lee

  • Affiliations:
  • Department of Statistics, Korea University, Seoul 136-701, Republic of Korea;Department of Applied Statistics, Catholic University of Daegu, Kyungbuk 712-702, Republic of Korea;Division of Information and Computer Science, Dankook University, Gyeonggido 448-160, Republic of Korea;Skin Research Institute, AmorePacific R&D Center, Kyounggi-do, Republic of Korea;Department of Statistics, Korea University, Seoul 136-701, Republic of Korea

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2009

Quantified Score

Hi-index 0.03

Visualization

Abstract

The genetic regulatory mechanism plays a pivotal role in many biological processes ranging from development to survival. The identification of the common transcription factor binding sites (TFBSs) from a set of known co-regulated gene promoters and the identification of genes that are regulated by the transcription factor (TF) that have important roles in a particular biological function will advance our understanding of the interaction among the co-regulated genes and intricate genetic regulatory mechanism underlying this function. To identify the common TFBSs from a set of known co-regulated gene promoters and classify genes that are regulated by TFs, the new approaches using Support Vector Machine (SVM)-based Generalized Approximate Cross Validation (GACV) criteria are proposed. Two variable selection methods are considered for Recursive Feature Elimination (RFE) and Recursive Feature Addition (RFA). Performances of the proposed methods are compared with the existing SVM-based criteria, Logistic Regression Analysis (LRA), Logic Regression (LR), and Decision Tree (DT) methods by using both two real TF target genes data and the simulated data. In terms of test error rates, the proposed methods perform better than the existing methods.