Clustering Binary Fingerprint Vectors with Missing Values for DNA Array Data Analysis

  • Authors:
  • Andres Figueroa;James Borneman;Tao Jiang

  • Affiliations:
  • -;-;-

  • Venue:
  • CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Oligonucleotide fingerprinting is a powerful DNA arraybased method to characterize cDNA and ribosomal RNAgene (rDNA) libraries and has many applications includinggene expression profiling and DNA clone classification.We are especially interested in the latter application. Akey step in the method is the cluster analysis of fingerprintdata obtained from DNA array hybridization experiments.Most of the existing approaches to clustering use (normalized)real intensity values and thus do not treat positive andnegative hybridization signals equally (positive signals aremuch more emphasized). In this paper, we consider a discreteapproach. Fingerprint data are first normalized andbinarized using control DNA clones. Because there mayexist unresolved (or missing) values in this binarizationprocess, we formulate the clustering of (binary) oligonucleotidefingerprints as a combinatorial optimization problemthat attempts to identify clusters and resolve the missingvalues in the fingerprints simultaneously. We study thecomputational complexity of this clustering problem anda natural parameterized version, and present an efficientgreedy algorithm based on MINIMUM CLIQUE PARTITIONon graphs. The algorithm takes advantage of some uniqueproperties of the graphs considered here, which allow us toefficiently find the maximum cliques as well as some specialmaximal cliques. Our experimental results on simulatedand real data demonstrate that the algorithm runs fasterand performs better than some popular hierarchical andgraph-based clustering methods. The results on real datafrom DNA clone classification also suggest that this discreteapproach is more accurate than clustering methodsbased on real intensity values, in terms of separating clonesthat have different characteristics with respect to the givenoligonucleotide probes.