A new approach to text searching
Communications of the ACM
Fast text searching: allowing errors
Communications of the ACM
A fast string-searching algorithm for multiple patterns
Information Processing and Management: an International Journal
A new algorithm for data compression
The C Users Journal
Text algorithms
Let sleeping files lie: pattern matching in Z-compressed files
Journal of Computer and System Sciences
A fast string searching algorithm
Communications of the ACM
Efficient string matching: an aid to bibliographic search
Communications of the ACM
Text Compression Using Antidictionaries
ICAL '99 Proceedings of the 26th International Colloquium on Automata, Languages and Programming
Algorithms on Compressed Strings and Arrays
SOFSEM '99 Proceedings of the 26th Conference on Current Trends in Theory and Practice of Informatics on Theory and Practice of Informatics
Boyer-Moore String Matching over Ziv-Lempel Compressed Text
COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
A Text Compression Scheme That Allows Fast Searching Directly in the Compressed File
CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
Pattern Matching in Text Compressed by Using Antidictionaries
CPM '99 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching
Shift-And Approach to Pattern Matching in LZW Compressed Text
CPM '99 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching
A General Practical Approach to Pattern Matching over Ziv-Lempel Compressed Text
CPM '99 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching
A Unifying Framework for Compressed Pattern Matching
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Multiple Pattern Matching in LZW Compressed Text
DCC '98 Proceedings of the Conference on Data Compression
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Multiple Pattern Matching Algorithms on Collage System
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
String Matching with Stopper Encoding and Code Splitting
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
String Matching with Stopper Compression
DCC '02 Proceedings of the Data Compression Conference
Pattern Matching in Huffman Encoded Texts
DCC '01 Proceedings of the Data Compression Conference
Compressed Pattern Matching for Sequitur
DCC '01 Proceedings of the Data Compression Conference
Compressed Pattern Matching in DNA Sequences
CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
A Run-Time Efficient Implementation of Compressed Pattern Matching Automata
CIAA '08 Proceedings of the 13th international conference on Implementation and Applications of Automata
Context-Sensitive Grammar Transform: Compression and Pattern Matching
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Fast matching method for DNA sequences
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
FRESCO: Referential Compression of Highly Similar Sequences
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
We apply the Boyer-Moore technique to compressed pattern matching for text string described in terms of collage system, which is a formal framework that captures various dictionary-based compression methods. For a subclass of collage systems that contain no truncation, our new algorithm runs in O(∥D∥ + n ċ m + m2 + r) time using O(∥D∥ + m2) space, where ∥D∥ is the size of dictionary D, n is the compressed text length, m is the pattern length, and r is the number of pattern occurrences. For a general collage system, the time complexity is O(height(D)ċ(∥D∥+n)+nċm+m2+r), where height(D) is the maximum dependency of tokens in D. We showed that the algorithm specialized for the so-called byte pair encoding (BPE) is very fast in practice. In fact it runs about 1.2 - 3.0 times faster than the exact match routine of the software package agrep, known as the fastest pattern matching tool.