Two algorithms for maintaining order in a list
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
Matching patterns in strings subject to multi-linear transformations
Theoretical Computer Science
Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
Efficient pattern matching with scaling
Journal of Algorithms
Edit distance of run-length coded strings
SAC '92 Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing: technological challenges of the 1990's
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
The string B-tree: a new data structure for string search in external memory and its applications
Journal of the ACM (JACM)
On two-dimensional indexability and optimal range search indexing
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Matching for run-length encoded strings
Journal of Complexity
Let sleeping files lie: pattern matching in Z-compressed files
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
ACM Transactions on Database Systems (TODS)
PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric
Journal of the ACM (JACM)
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Inplace run-length 2d compressed search
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
ACM Computing Surveys (CSUR)
Communications of the ACM
External memory algorithms and data structures: dealing with massive data
ACM Computing Surveys (CSUR)
Edit distance of run-length encoded strings
Information Processing Letters
An Efficient Multiversion Access Structure
IEEE Transactions on Knowledge and Data Engineering
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
SEQ: A Model for Sequence Databases
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
A Database Index to Large Biological Sequences
Proceedings of the 27th International Conference on Very Large Data Bases
Optimal Two-Dimensional Compressed Matching
ICALP '94 Proceedings of the 21st International Colloquium on Automata, Languages and Programming
Approximate Matching of Run-Length Compressed Strings
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
An asymptotically optimal multiversion B-tree
The VLDB Journal — The International Journal on Very Large Data Bases
Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Searching BWT Compressed Text with the Boyer-Moore Algorithm and Binary Search
DCC '02 Proceedings of the Data Compression Conference
The suffix binary search tree and suffix AVL tree
Journal of Discrete Algorithms
Regular expression searching on compressed text
Journal of Discrete Algorithms
Engineering a Fast Online Persistent Suffix Tree Construction
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Tight bounds for the partial-sums problem
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
When indexing equals compression: experiments with compressing suffix arrays and applications
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Information Processing Letters
C-store: a column-oriented DBMS
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Practical suffix tree construction
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Dynamic entropy-compressed sequences and full-text indexes
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Algorithms and data structures for external memory
Foundations and Trends® in Theoretical Computer Science
Efficient indexing algorithms for one-dimensional discretely-scaled strings
Information Processing Letters
Reordering columns for smaller indexes
Information Sciences: an International Journal
Compressed indexes for aligned pattern matching
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Fast algorithms for computing the constrained LCS of run-length encoded strings
Theoretical Computer Science
Hi-index | 0.00 |
Run-Length-Encoding (RLE) is a data compression technique that is used in various applications, e.g., time series, biological sequences, and multimedia databases. One of the main challenges is how to operate on (e.g., index, search, and retrieve) compressed data without decompressing it. In this paper, we introduce the String B-tree for Compressed sequences, termed the SBC-tree, for indexing and searching RLE-compressed sequences of arbitrary length. The SBC-tree is a two-level index structure based on the well-known String B-tree and a 3-sided range query structure [7]. The SBC-tree supports pattern matching queries such as substring matching, prefix matching, and range search operations over RLE-compressed sequences. The SBC-tree has an optimal external-memory space complexity of O(N/B) pages, where N is the total length of the compressed sequences, and B is the disk page size. Substring matching, prefix matching, and range search execute in an optimal O(logB N + |p|+T/B) I/O operations, where |p| is the length of the compressed query pattern and T is the query output size. The SBC-tree is also dynamic and supports insert and delete operations efficiently. The insertion and deletion of all suffixes of a compressed sequence of length m take O(m logB(N + m)) amortized I/O operations. The SBC-tree index is realized inside PostgreSQL. Performance results illustrate that using the SBC-tree to index RLE-compressed sequences achieves up to an order of magnitude reduction in storage, while retains the optimal search performance achieved by the String B-tree over the uncompressed sequences.