Labeling negative examples in supervised learning of new gene regulatory connections

  • Authors:
  • Luigi Cerulo;Vincenzo Paduano;Pietro Zoppoli;Michele Ceccarelli

  • Affiliations:
  • Dept. of Biological and Environmental Studies, University of Sannio, Benevento and Biogem s.c.ar.l., Institute of Genetic Research;Biogem s.c.ar.l., Institute of Genetic Research;Biogem s.c.ar.l., Institute of Genetic Research;Dept. of Biological and Environmental Studies, University of Sannio, Benevento and Biogem s.c.ar.l., Institute of Genetic Research

  • Venue:
  • CIBB'10 Proceedings of the 7th international conference on Computational intelligence methods for bioinformatics and biostatistics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Supervised learning methods have been recently exploited to learn gene regulatory networks from gene expression data. The basic approach consists into building a binary classifier from feature vectors composed by expression levels of a set of known regulatory connections, available in public databases or known in literature. Such a classifier is then used to predict new unknown connections. The quality of the training set plays a crucial role in such an inference scheme. In binary classification the training set should be composed of positive and negative examples, but in Biology literature the only collected information is whether two genes interact. Instead, the counterpart information is usually not reported, as Biologists are not aware to state whether two genes are not interacting. The over presence of topology motifs in currently known gene regulatory networks, such as, feed-forward loops, bi-fan clusters, and single input modules, could drive the selection of reliable negative examples. We introduce, discuss, and evaluate a number of negative selection heuristics that exploits the known gene network topology of Escherichia coli and Saccharomyces cerevisiae.