A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Clustering by pattern similarity in large data sets
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Constructing Suffix Trees On-Line in Linear Time
Proceedings of the IFIP 12th World Computer Congress on Algorithms, Software, Architecture - Information Processing '92, Volume 1 - Volume I
Biclustering of Expression Data
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
On Approximate Nearest Neighbors in Non-Euclidean Spaces
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
WF-MSB: A weighted fuzzy-based biclustering method for gene expression data
International Journal of Data Mining and Bioinformatics
Hi-index | 0.00 |
One fundamental task in near-neighbor search as well as other similarity matching efforts is to find a distance function that can efficiently quantify the similarity between two objects in a meaningful way. In DNA microarray analysis, the expression levels of two closely related genes may rise and fall synchronously in response to a set of experimental stimuli. Although the magnitude of their expression levels may not be close, the patterns they exhibit can be very similar. Unfortunately, none of the conventional distance metrics such as the Lp norm can model this similarity effectively. In this paper, we study the near-neighbor search problem based on this new type of similarity. We propose to measure the distance between two genes by subspace pattern similarity, i.e., whether they exhibit a synchronous pattern of rise and fall on a subset of dimensions. We then present an efficient algorithm for subspace near-neighbor search based on pattern similarity distance, and we perform tests on various data sets to show its effectiveness.