The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
New indices for text: PAT Trees and PAT arrays
Information retrieval
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Fast subsequence matching in time-series databases
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
The string B-tree: a new data structure for string search in external memory and its applications
Journal of the ACM (JACM)
Fast string searching in secondary storage: theoretical developments and experimental results
Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
On effective multi-dimensional indexing for strings
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Clustering by pattern similarity in large data sets
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Constructing Suffix Trees On-Line in Linear Time
Proceedings of the IFIP 12th World Computer Congress on Algorithms, Software, Architecture - Information Processing '92, Volume 1 - Volume I
Biclustering of Expression Data
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Mining Generalized Association Rules
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Landmarks: A New Model for Similarity-Based Pattern Querying in Time Series Databases
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
d-Clusters: Capturing Subspace Correlation in a Large Data Set
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Hi-index | 0.00 |
The DNA microarray technology is about to bring an explosion of gene expression data that may dwarf even the human sequencing projects. Researchers are motivated to identify genes whose expression levels rise and fall coherently under a set of experimental perturbances, that is, they exhibit fluctuation of a similar shape when conditions change. In this paper, we show that queries based on pattern correlations against large-scale microarray databases can be supported by the weighted-sequence model, an index structure designed for sequence matching. A weighted-sequence is a two-dimensional structure where each element in thesequence is associated with a weight. We transform the DNA microarray data, as well as pattern-based queries, into weighted-sequences, and use subsequence matching algorithms to retrieve from the database all genes that match the query pattern. We demonstrate, using both synthetic and real-world data sets, that our method is effective and efficient.