Collage system: a unifying framework for compressed pattern matching

Authors:
Takuya Kida;Tetsuya Matsumoto;Yusuke Shibata;Masayuki Takeda;Ayumi Shinohara;Setsuo Arikawa
Affiliations:
Department of Informatics, Kyushu University, 33 Fukuoka 812-8581, Japan;Department of Informatics, Kyushu University, 33 Fukuoka 812-8581, Japan;Department of Informatics, Kyushu University, 33 Fukuoka 812-8581, Japan;Department of Informatics, Kyushu University, 33 Fukuoka 812-8581, Japan and PRESTO, Japan Science and Technology Corporation, Japan;Department of Informatics, Kyushu University, 33 Fukuoka 812-8581, Japan;Department of Informatics, Kyushu University, 33 Fukuoka 812-8581, Japan
Venue:
Theoretical Computer Science - Selected papers in honour of Setsuo Arikawa
Year:
2003

Citing 22
Cited 14

Matching patterns in strings subject to multi-linear transformations

Theoretical Computer Science
Two-dimensional periodicity and its applications

SODA '92 Proceedings of the third annual ACM-SIAM symposium on Discrete algorithms
Efficient pattern matching with scaling

Journal of Algorithms
Text algorithms

Text algorithms
String matching in Lempel-Ziv compressed strings

STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
Let sleeping files lie: pattern matching in Z-compressed files

Journal of Computer and System Sciences
Pattern matching algorithms

Pattern matching algorithms
Fast searching on compressed text allowing errors

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Data compression via textual substitution

Journal of the ACM (JACM)
The Data Compression Book

The Data Compression Book
Efficient Algorithms for Lempel-Zip Encoding (Extended Abstract)

SWAT '96 Proceedings of the 5th Scandinavian Workshop on Algorithm Theory
Optimal Two-Dimensional Compressed Matching

ICALP '94 Proceedings of the 21st International Colloquium on Automata, Languages and Programming
Pattern Matching in Compressed Texts

Proceedings of the 15th Conference on Foundations of Software Technology and Theoretical Computer Science
Color Set Size Problem with Application to String Matching

CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
An Improved Pattern Matching Algorithm for Strings in Terms of Straight-Line Programs

CPM '97 Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching
Offline Dictionary-Based Compression

DCC '99 Proceedings of the Conference on Data Compression
Bit-Parallel Approach to Approximate String Matching in Compressed Texts

SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
Multiple Pattern Matching in LZW Compressed Text

DCC '98 Proceedings of the Conference on Data Compression
Grammar-based codes: a new class of universal lossless source codes

IEEE Transactions on Information Theory
Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform. I. Without context models

IEEE Transactions on Information Theory
Universal lossless compression via multilevel pattern matching

IEEE Transactions on Information Theory
A universal algorithm for sequential data compression

IEEE Transactions on Information Theory

A Run-Time Efficient Implementation of Compressed Pattern Matching Automata

CIAA '08 Proceedings of the 13th international conference on Implementation and Applications of Automata
Context-Sensitive Grammar Transform: Compression and Pattern Matching

SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
A fully linear-time approximation algorithm for grammar-based compression

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Towards approximate matching in compressed strings: local subsequence recognition

CSR'11 Proceedings of the 6th international conference on Computer science: theory and applications
Pattern matching in lempel-Ziv compressed strings: fast, simple, and deterministic

ESA'11 Proceedings of the 19th European conference on Algorithms
Fast q-gram mining on SLP compressed strings

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Functional programs as compressed data

PEPM '12 Proceedings of the ACM SIGPLAN 2012 workshop on Partial evaluation and program manipulation
Improving time and space complexity for compressed pattern matching

ISAAC'06 Proceedings of the 17th international conference on Algorithms and Computation
Self-Indexed Grammar-Based Compression

Fundamenta Informaticae
Processing compressed texts: a tractability border

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Efficient LZ78 factorization of grammar compressed text

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Improved grammar-based compressed indexes

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Fast q-gram mining on SLP compressed strings

Journal of Discrete Algorithms
Compressed automata for dictionary matching

CIAA'13 Proceedings of the 18th international conference on Implementation and Application of Automata

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a general framework which is suitable to capture the essence of compressed pattern matching according to various dictionary-based compressions. It is a formal system to represent a string by a pair of dictionary D and sequence S of phrases in D. The basic operations are concatenation, truncation, and repetition. We also propose a compressed pattern matching algorithm for the framework. The goal is to find all occurrences of a pattern in a text without decompression, which is one of the most active topics in string matching. Our framework includes such compression methods as Lempel-Ziv family (LZ77, LZSS, LZ78, LZW), RE-PAIR, SEQUITUR, and the static dictionary-based method. The proposed algorithm runs in O((||D|| + |S|)- height(D) + m2 + r) time with O(||D|| + m2) space, where ||D|| is the size of D, |S| is the number of tokens in S, height(D) is the maximum dependency of tokens in D, m is the pattern length, and r is the number of pattern occurrences. For a subclass of the framework that contains no truncation, the time complexity is O(||D|| + |S| + m2 + r).