Genes@Work: an efficient algorithm for pattern discovery and multivariate feature selection in gene expression data

  • Authors:
  • Jorge Lepre;J. Jeremy Rice;Yuhai Tu;Gustavo Stolovitzky

  • Affiliations:
  • IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY 10598, USA;IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY 10598, USA;IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY 10598, USA;IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY 10598, USA

  • Venue:
  • Bioinformatics
  • Year:
  • 2004

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: Despite the growing literature devoted to finding differentially expressed genes in assays probing different tissues types, little attention has been paid to the combinatorial nature of feature selection inherent to large, high-dimensional gene expression datasets. New flexible data analysis approaches capable of searching relevant subgroups of genes and experiments are needed to understand multivariate associations of gene expression patterns with observed phenotypes. Results: We present in detail a deterministic algorithm to discover patterns of multivariate gene associations in gene expression data. The patterns discovered are differential with respect to a control dataset. The algorithm is exhaustive and efficient, reporting all existent patterns that fit a given input parameter set while avoiding enumeration of the entire pattern space. The value of the pattern discovery approach is demonstrated by finding a set of genes that differentiate between two types of lymphoma. Moreover, these genes are found to behave consistently in an independent dataset produced in a different laboratory using different arrays, thus validating the genes selected using our algorithm. We show that the genes deemed significant in terms of their multivariate statistics will be missed using other methods. Availability: Our set of pattern discovery algorithms including a user interface is distributed as a package called Genes@Work. This package is freely available to non-commercial users and can be downloaded from our website (http://www.research.ibm.com/FunGen).