Labeling negative examples in supervised learning of new gene regulatory connections

Authors:
Luigi Cerulo;Vincenzo Paduano;Pietro Zoppoli;Michele Ceccarelli
Affiliations:
Dept. of Biological and Environmental Studies, University of Sannio, Benevento and Biogem s.c.ar.l., Institute of Genetic Research;Biogem s.c.ar.l., Institute of Genetic Research;Biogem s.c.ar.l., Institute of Genetic Research;Dept. of Biological and Environmental Studies, University of Sannio, Benevento and Biogem s.c.ar.l., Institute of Genetic Research
Venue:
CIBB'10 Proceedings of the 7th international conference on Computational intelligence methods for bioinformatics and biostatistics
Year:
2010

Citing 9
Cited 0

PEBL: Web Page Classification without Negative Examples

IEEE Transactions on Knowledge and Data Engineering
Conserved network motifs allow protein--protein interaction prediction

Bioinformatics
Kernel methods for predicting protein--protein interactions

Bioinformatics
A note on Platt's probabilistic outputs for support vector machines

Machine Learning
Glycan classification with tree kernels

Bioinformatics
Learning classifiers from only positive and unlabeled data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
SIRENE

Bioinformatics
Learning to classify texts using positive and unlabeled data

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Selection of negative examples in learning gene regulatory networks

BIBMW '09 Proceedings of the 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Supervised learning methods have been recently exploited to learn gene regulatory networks from gene expression data. The basic approach consists into building a binary classifier from feature vectors composed by expression levels of a set of known regulatory connections, available in public databases or known in literature. Such a classifier is then used to predict new unknown connections. The quality of the training set plays a crucial role in such an inference scheme. In binary classification the training set should be composed of positive and negative examples, but in Biology literature the only collected information is whether two genes interact. Instead, the counterpart information is usually not reported, as Biologists are not aware to state whether two genes are not interacting. The over presence of topology motifs in currently known gene regulatory networks, such as, feed-forward loops, bi-fan clusters, and single input modules, could drive the selection of reliable negative examples. We introduce, discuss, and evaluate a number of negative selection heuristics that exploits the known gene network topology of Escherichia coli and Saccharomyces cerevisiae.