Clustering Binary Fingerprint Vectors with Missing Values for DNA Array Data Analysis

Authors:
Andres Figueroa;James Borneman;Tao Jiang
Affiliations:
-;-;-
Venue:
CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Year:
2003

Citing 5
Cited 3

Network flows: theory, algorithms, and applications

Network flows: theory, algorithms, and applications
Context-specific Bayesian clustering for gene expression data

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties

Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties
Analysis of gene expression profiles: class discovery and leaf ordering

Proceedings of the sixth annual international conference on Computational biology
Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology

Stereotype extraction with default clustering

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
ABBA: adaptive bicluster-based approach to impute missing values in binary matrices

Proceedings of the 2010 ACM Symposium on Applied Computing
Default clustering from sparse data sets

ECSQARU'05 Proceedings of the 8th European conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty

Quantified Score

Hi-index	0.00

Visualization

Abstract

Oligonucleotide fingerprinting is a powerful DNA arraybased method to characterize cDNA and ribosomal RNAgene (rDNA) libraries and has many applications includinggene expression profiling and DNA clone classification.We are especially interested in the latter application. Akey step in the method is the cluster analysis of fingerprintdata obtained from DNA array hybridization experiments.Most of the existing approaches to clustering use (normalized)real intensity values and thus do not treat positive andnegative hybridization signals equally (positive signals aremuch more emphasized). In this paper, we consider a discreteapproach. Fingerprint data are first normalized andbinarized using control DNA clones. Because there mayexist unresolved (or missing) values in this binarizationprocess, we formulate the clustering of (binary) oligonucleotidefingerprints as a combinatorial optimization problemthat attempts to identify clusters and resolve the missingvalues in the fingerprints simultaneously. We study thecomputational complexity of this clustering problem anda natural parameterized version, and present an efficientgreedy algorithm based on MINIMUM CLIQUE PARTITIONon graphs. The algorithm takes advantage of some uniqueproperties of the graphs considered here, which allow us toefficiently find the maximum cliques as well as some specialmaximal cliques. Our experimental results on simulatedand real data demonstrate that the algorithm runs fasterand performs better than some popular hierarchical andgraph-based clustering methods. The results on real datafrom DNA clone classification also suggest that this discreteapproach is more accurate than clustering methodsbased on real intensity values, in terms of separating clonesthat have different characteristics with respect to the givenoligonucleotide probes.