A new approach to text searching
Communications of the ACM
Fast text searching: allowing errors
Communications of the ACM
A new algorithm for data compression
The C Users Journal
String matching in Lempel-Ziv compressed strings
STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
Let sleeping files lie: pattern matching in Z-compressed files
Journal of Computer and System Sciences
Palm database programming: the complete developer's guide
Palm database programming: the complete developer's guide
A fast string searching algorithm
Communications of the ACM
Palm Programming: The Developer's Guide with CD-ROM
Palm Programming: The Developer's Guide with CD-ROM
Sams Teach Yourself Palm Programming in 24 Hours with Cdrom
Sams Teach Yourself Palm Programming in 24 Hours with Cdrom
A Text Compression Scheme That Allows Fast Searching Directly in the Compressed File
CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
Regular Expression Searching over Ziv-Lempel Compressed Text
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Offline Dictionary-Based Compression
DCC '99 Proceedings of the Conference on Data Compression
Hybrid Prefix Codes for Practical Use
DCC '03 Proceedings of the Conference on Data Compression
A Unifying Framework for Compressed Pattern Matching
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Multiple Pattern Matching in LZW Compressed Text
DCC '98 Proceedings of the Conference on Data Compression
Pattern Matching in Huffman Encoded Texts
DCC '01 Proceedings of the Data Compression Conference
Faster Approximate String Matching over Compressed Text
DCC '01 Proceedings of the Data Compression Conference
Compressed Pattern Matching for Sequitur
DCC '01 Proceedings of the Data Compression Conference
Fast Searching over Compressed Text using A New Coding Technique: Tagged Sub-optimal Code (TSC)
DCC '04 Proceedings of the Conference on Data Compression
String matching over compressed text on handheld devices
String matching over compressed text on handheld devices
SASE: implementation of a compressed text search engine
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Hi-index | 0.00 |
This paper presents Tagged Sub-optimal code (TSC), a new coding technique to speed up string matching over compressed databases on personal digital assistants (PDA). TSC is a variable-length sub-optimal code that supports minimal prefix property. It always determines its codeword boundary without traversing a tree or lookup table. TSC technique may be beneficial in many types of applications: speeding up string matching over compressed text, and speeding decoding process. This paper also presents two algorithms for string matching over compressed text using TSC (SCTT) and the Byte Pair Encoding (BPE) technique (SCTB). indent Several experiments were conducted to compare the performance of TSC, Byte Pair Encoding (BPE), and Huffman code. Several PDA databases with different record sizes were used: the well-known Calgary dataset and a set of small-sized PDA databases. Experimental results show that SCTT is almost twice as fast as the Huffman-based algorithm. SCTT has also the same performance in search time as the search in uncompressed databases and is faster than the SCTB algorithm. For frequently updated PDA databases such as phone books, to-do list, and memos, SCTT is the recommended method regardless of the size of the average record length, since the time required to compress the updated records using BPE poses significant delays compared to TSC.