Pattern-based similarity search for microarray data

Authors:
Haixun Wang;Jian Pei;Philip S. Yu
Affiliations:
IBM T. J. Watson Research, Hawthorne, NY;Simon Fraser University, Canada;IBM T. J. Watson Research, Hawthorne, NY
Venue:
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Year:
2005

Citing 5
Cited 1

A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Clustering by pattern similarity in large data sets

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Constructing Suffix Trees On-Line in Linear Time

Proceedings of the IFIP 12th World Computer Congress on Algorithms, Software, Architecture - Information Processing '92, Volume 1 - Volume I
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
On Approximate Nearest Neighbors in Non-Euclidean Spaces

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science

WF-MSB: A weighted fuzzy-based biclustering method for gene expression data

International Journal of Data Mining and Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

One fundamental task in near-neighbor search as well as other similarity matching efforts is to find a distance function that can efficiently quantify the similarity between two objects in a meaningful way. In DNA microarray analysis, the expression levels of two closely related genes may rise and fall synchronously in response to a set of experimental stimuli. Although the magnitude of their expression levels may not be close, the patterns they exhibit can be very similar. Unfortunately, none of the conventional distance metrics such as the Lp norm can model this similarity effectively. In this paper, we study the near-neighbor search problem based on this new type of similarity. We propose to measure the distance between two genes by subspace pattern similarity, i.e., whether they exhibit a synchronous pattern of rise and fall on a subset of dimensions. We then present an efficient algorithm for subspace near-neighbor search based on pattern similarity distance, and we perform tests on various data sets to show its effectiveness.