Storing text retrieval systems on CD-ROM: compression and encryption considerations
ACM Transactions on Information Systems (TOIS)
Text compression
Fast text searching: allowing errors
Communications of the ACM
Software—Practice & Experience
Two-dimensional periodicity and its applications
SODA '92 Proceedings of the third annual ACM-SIAM symposium on Discrete algorithms
Let sleeping files lie: pattern matching in Z-compressed files
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
A fast string searching algorithm
Communications of the ACM
Efficient generation of the binary reflected gray code and its applications
Communications of the ACM
Efficient string matching: an aid to bibliographic search
Communications of the ACM
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
DZ: A Text Compression Algorithm For Natural Languages
CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Fast searching on compressed text allowing errors
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Fast and flexible word searching on compressed text
ACM Transactions on Information Systems (TOIS)
Interactive phrase browsing within compressed text
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Algorithms on Compressed Strings and Arrays
SOFSEM '99 Proceedings of the 26th Conference on Current Trends in Theory and Practice of Informatics on Theory and Practice of Informatics
Boyer-Moore String Matching over Ziv-Lempel Compressed Text
COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
Block Merging for Off-Line Compression
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
String Matching with Stopper Encoding and Code Splitting
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
A General Practical Approach to Pattern Matching over Ziv-Lempel Compressed Text
CPM '99 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching
Regular Expression Searching over Ziv-Lempel Compressed Text
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
A New Compression Method for Compressed Matching
DCC '00 Proceedings of the Conference on Data Compression
Searching in Compressed Dictionaries
DCC '02 Proceedings of the Data Compression Conference
Pattern Matching in Huffman Encoded Texts
DCC '01 Proceedings of the Data Compression Conference
Faster Approximate String Matching over Compressed Text
DCC '01 Proceedings of the Data Compression Conference
Compressed Pattern Matching for Sequitur
DCC '01 Proceedings of the Data Compression Conference
Time/space efficient compressed pattern matching
Fundamenta Informaticae - Special issue on computing patterns in strings
Regular expression searching on compressed text
Journal of Discrete Algorithms
Approximate string matching on Ziv-Lempel compressed text
Journal of Discrete Algorithms
Pattern matching in Huffman encoded texts
Information Processing and Management: an International Journal
LZgrep: a Boyer–Moore string matching tool for Ziv–Lempel compressed text: Research Articles
Software—Practice & Experience
Adapting the Knuth-Morris-Pratt algorithm for pattern matching in Huffman encoded texts
Information Processing and Management: an International Journal
Block merging for off-line compression
Journal of the American Society for Information Science and Technology
A Run-Time Efficient Implementation of Compressed Pattern Matching Automata
CIAA '08 Proceedings of the 13th international conference on Implementation and Applications of Automata
Context-Sensitive Grammar Transform: Compression and Pattern Matching
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Simple Random Access Compression
Fundamenta Informaticae
Adapting the Knuth-Morris-Pratt algorithm for pattern matching in Huffman encoded texts
Information Processing and Management: an International Journal
Simple compression code supporting random access and fast string matching
WEA'07 Proceedings of the 6th international conference on Experimental algorithms
Interpolative coding of integer sequences supporting log-time random access
Information Processing and Management: an International Journal
Fast decoding algorithms for variable-lengths codes
Information Sciences: an International Journal
Phrase-Based pattern matching in compressed text
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Simple Random Access Compression
Fundamenta Informaticae
Accelerating multipattern matching on compressed HTTP traffic
IEEE/ACM Transactions on Networking (TON)
Time/Space Efficient Compressed Pattern Matching
Fundamenta Informaticae - Computing Patterns in Strings
Fast matching method for DNA sequences
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Grammar precompression speeds up burrows---wheeler compression
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Hi-index | 0.00 |
A new text compression scheme is presented in this article. The main purpose of this scheme is to speed up string matching by searching the compressed file directly. The scheme requires no modification of the string-matching algorithm, which is used as a black box; any string-matching procedure can be used. Instead, the pattern is modified; only the outcome of the matching of the modified pattern against the compressed file is decompressed. Since the compressed file is smaller than the original file, the search is faster both in terms of I/O time and precessing time than a search in the original file. For typical text files, we achieve about 30% reduction of space and slightly less of search time. A 30% space saving is not competitive with good text compression schemes, and thus should not be used where space is the predominant concern. The intended applications of this scheme are files that are searched often, such as catalogs, bibliographic files, and address books. Such files are typically not compressed, but with this scheme they can remain compressed indefinitely, saving space while allowing faster search at the same time. A particular application to an information retrieval system that we developed is also discussed.