Interpreting microarray expression data using text annotating the genes

Authors:
Michael Molla;Peter Andreae;Jeremy Glasner;Frederick Blattner;Jude Shavlik
Affiliations:
Department of Computer Sciences, University of Wisconsin--Madison, Madison, WI;Department of Computer Sciences, University of Wisconsin--Madison, Madison, WI;Department of Genetics, University of Wisconsin--Madison, Madison, WI;Department of Genetics, University of Wisconsin--Madison, Madison, WL;Department of Computer Sciences, University of Wisconsin--Madison, Madison, WI and Department of Biostatistics and Medical Informatics, University of Wisconsin--Madison, Madison, WI
Venue:
Information Sciences—Applications: An International Journal
Year:
2002

Citing 6
Cited 4

Encouraging Experimental Results on Learning CNF

Machine Learning
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Machine Learning

Machine Learning
Learning Logical Definitions from Relations

Machine Learning
Feature selection for high-dimensional genomic microarray data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Untangling text data mining

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

Conceptual Clustering of Heterogeneous GeneExpression Sequences

Artificial Intelligence Review
Biological applications of multi-relational data mining

ACM SIGKDD Explorations Newsletter
Using machine learning to design and interpret gene-expression microarrays

AI Magazine
Mining gene expression data with pattern structures in formal concept analysis

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Microarray expression data is being generated by the gigabyte all over the world with undoubted exponential increases to come. Annotated genomic data is also rapidly pouring into public databases. Our goal is to develop automated ways of combining these two sources of information to produce insight into the operation of cells under various conditions. Our approach is to use machine-learning techniques to identify characteristics of genes that are up-regulated or down-regulated in a particular micro-array experiment. We seek models that are (a) accurate. (b) easy to interpret, and (c) stable to small variations in the training data. This paper explores the effectiveness of two standard machine-learning algorithms for this task: Naïve Bayes (based on probability) and PFOIL (based on building rules). Although we do not anticipate using our learned models to predict expression levels of genes, we cast the task in a predictive framework, and evaluate the quality of the models in terms of their predictive power on genes held out from the training. The paper reports on experiments using actual E. coli microarray data, discussing the strengths and weaknesses of the two algorithms and demonstrating the trade-offs between accuracy, comprehensibility, and stability.