An Index Structure for Pattern Similarity Searching in DNA Microarray Data

Authors:
Haixun Wang;Chang-Shing Perng;Wei Fan;Philip S. Yu
Affiliations:
-;-;-;-
Venue:
CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Year:
2002

Citing 16
Cited 1

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
New indices for text: PAT Trees and PAT arrays

Information retrieval
Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
The string B-tree: a new data structure for string search in external memory and its applications

Journal of the ACM (JACM)
Fast string searching in secondary storage: theoretical developments and experimental results

Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
On effective multi-dimensional indexing for strings

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Clustering by pattern similarity in large data sets

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Constructing Suffix Trees On-Line in Linear Time

Proceedings of the IFIP 12th World Computer Congress on Algorithms, Software, Architecture - Information Processing '92, Volume 1 - Volume I
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Mining Generalized Association Rules

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Landmarks: A New Model for Similarity-Based Pattern Querying in Time Series Databases

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
d-Clusters: Capturing Subspace Correlation in a Large Data Set

ICDE '02 Proceedings of the 18th International Conference on Data Engineering

A data mining approach for branch and ATM site evaluation

Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

The DNA microarray technology is about to bring an explosion of gene expression data that may dwarf even the human sequencing projects. Researchers are motivated to identify genes whose expression levels rise and fall coherently under a set of experimental perturbances, that is, they exhibit fluctuation of a similar shape when conditions change. In this paper, we show that queries based on pattern correlations against large-scale microarray databases can be supported by the weighted-sequence model, an index structure designed for sequence matching. A weighted-sequence is a two-dimensional structure where each element in thesequence is associated with a weight. We transform the DNA microarray data, as well as pattern-based queries, into weighted-sequences, and use subsequence matching algorithms to retrieve from the database all genes that match the query pattern. We demonstrate, using both synthetic and real-world data sets, that our method is effective and efficient.