Lower bounds for orthogonal range searching: I. The reporting case
Journal of the ACM (JACM)
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Succinct Representation of Balanced Parentheses and Static Trees
SIAM Journal on Computing
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
New data structures for orthogonal range searching
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Journal of the ACM (JACM)
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching
SIAM Journal on Computing
ACM Computing Surveys (CSUR)
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets
ACM Transactions on Algorithms (TALG)
Compressed Suffix Trees with Full Functionality
Theory of Computing Systems
The SBC-tree: an index for run-length compressed sequences
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Geometric Burrows-Wheeler Transform: Linking Range Searching and Text Indexing
DCC '08 Proceedings of the Data Compression Conference
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Space-Efficient Framework for Top-k String Retrieval Problems
FOCS '09 Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science
String retrieval for multi-pattern queries
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Hi-index | 0.00 |
In many situations like protein sequences, the primary protein sequence is associated with secondary structure labels [6]. This can be treated as two sequences aligned character by character. Many other DNA and RNA sequences involve linkages which are aligned across or in the same or different strands. In this paper, we consider the most natural characterization of aligned string data. The aligned pattern matching problem is to index two input texts. T1[1...n] and T2[1...n], each having n characters taken from an alphabet set Σ of size σ = polylog(n), such that the following query can be answered efficiently: given two query patterns P1 and P2, find all the text. positions i such that P1 matches with T1[i...(i+|P1|-1)] and P2 matches with T2[i...(i + |P2| - 1)]. Our objective is to design a compressed space index for this problem and we obtained the following main results: when the query patterns are sufficiently long (|P1|, |P2| α = Θ(log2+2ε n), where ε 0), we can design an index which takes nH′k +nH″k +o(n log σ) bits space and O(|P1| + |P2| + log4+4ε n + t) query time, where H′k and H″k denotes the empirical kth-order entropy (k = o(logσ n)) of T1 and T2 respectively, t represents the number of outputs and ε 0. Further we show that designing a compressed/succinct space index with polylogarithmic query time, which works for query patterns of all lengths is at least as hard as designing a linear space index for 3-dimensional orthogonal range reporting with poly-logarithmic query time. However, we introduce another compressed index of nH′k + nH″k + O(n) + o(n log σ) bits space requirement with a query time of O(|P1|+|P2|+√nt log2+ε n) which works without any restriction on the length of the patterns.