The GeneMine system for genome/proteome annotation and collaborative data mining

  • Authors:
  • C. Lee;K. Irizarry

  • Affiliations:
  • University of California, Los Angeles, Department of Chemistry and Biochemistry, Los Angeles, California;University of California, Los Angeles, Department of Chemistry and Biochemistry, Los Angeles, California

  • Venue:
  • IBM Systems Journal - Deep computing for the life sciences
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

As genome data and bioinformatics resources grow exponentially in size and complexity, there is an increasing need for software that can bridge the gap between biologists with questions and the worldwide set of highly specialized tools for answering them. The GeneMine system for small- to medium-scale genome analysis provides: (1) automated analysis of DNA (deoxyribonucleic acid) and protein sequence data using over 50 different analysis servers via the Internet, integrating data from homologous functions, tissue expression patterns, mapping, polymorphisms, model organism data and phenotypes, protein structural domains, active sites, motifs and other features, etc., (2) automated filtering and data reduction to highlight significant and interesting patterns, (3) a visual data-mining interface for rapidly exploring correlations, patterns, and contradictions within these data via aggregation, overlay, and drill-down, all projected onto relevant sequence alignments and three-dimensional structures, (4) a plug-in architecture that makes adding new types of analysis, data sources, and servers (including anything on the Internet) as easy as supplying the relevant URLs (uniform resource Locators), (5) a hypertext system that lets users create and share "live" views of their discoveries by embedding three-dimensional structures, alignments, and annotation data within their documents, and (6) an integrated database schema for mining large GeneMine data sets in a relational database. The value of the GeneMine system is that it automatically brings together and uncovers important functional information from a much wider range of sources than a given specialist would normally think to query, resulting in insights that the researcher was not planning to look for. In this paper we present the architecture of the software for integrating and mining very diverse biological data, and cross-validation of gene function predictions. The software is freely available at http://www.bioinformatics.ucla.edu/genemine.