A very fast substring search algorithm
Communications of the ACM
Journal of the ACM (JACM)
A new approach to text searching
Communications of the ACM
Fast text searching: allowing errors
Communications of the ACM
A new algorithm for data compression
The C Users Journal
Text algorithms
The zooming method: a recursive approach to time-space efficient string-matching
Theoretical Computer Science
A taxonomy of sublinear multiple keyword pattern matching algorithms
Science of Computer Programming
Pattern matching algorithms
A fast string searching algorithm
Communications of the ACM
A Text Compression Scheme That Allows Fast Searching Directly in the Compressed File
CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
Shift-And Approach to Pattern Matching in LZW Compressed Text
CPM '99 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching
A General Practical Approach to Pattern Matching over Ziv-Lempel Compressed Text
CPM '99 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching
Saving Comparisons in the Crochemore-Perrin String Matching Algorithm
ESA '93 Proceedings of the First Annual European Symposium on Algorithms
A Unifying Framework for Compressed Pattern Matching
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Multiple Pattern Matching in LZW Compressed Text
DCC '98 Proceedings of the Conference on Data Compression
Multiple Pattern Matching Algorithms on Collage System
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Compressed Pattern Matching for Sequitur
DCC '01 Proceedings of the Data Compression Conference
Compressed Pattern Matching in DNA Sequences
CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
A Run-Time Efficient Implementation of Compressed Pattern Matching Automata
CIAA '08 Proceedings of the 13th international conference on Implementation and Applications of Automata
Context-Sensitive Grammar Transform: Compression and Pattern Matching
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Fast q-gram mining on SLP compressed strings
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Random access to grammar-compressed strings
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Speeding up q-gram mining on grammar-based compressed texts
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Time/Space Efficient Compressed Pattern Matching
Fundamenta Informaticae - Computing Patterns in Strings
Speeding up HMM decoding and training by exploiting sequence repetitions
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Fast matching method for DNA sequences
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Fast q-gram mining on SLP compressed strings
Journal of Discrete Algorithms
Hi-index | 0.00 |
Byte pair encoding (BPE) is a simple universal text compression scheme. Decompression is very fast and requires small work space. Moreover, it is easy to decompress an arbitrary part of the original text. However, it has not been so popular since the compression is rather slow and the compression ratio is not as good as other methods such as Lempel-Ziv type compression. In this paper, we bring out a potential advantage of BPE compression. We show that it is very suitable from a practical view point of compressed pattern matching, where the goal is to find a pattern directly in compressed text without decompressing it explicitly. We compare running times to find a pattern in (1) BPE compressed files, (2) Lempel-Ziv-Welch compressed files, and (3) original text files, in various situations. Experimental results show that pattern matching in BPE compressed text is even faster than matching in the original text. Thus the BPE compression reduces not only the disk space but also the searching time.