Speeding up String Matching over Compressed Text on Handheld Devices using Tagged Sub-optimal Code (TSC)

  • Authors:
  • Affiliations:
  • Venue:
  • RTAS '04 Proceedings of the 10th IEEE Real-Time and Embedded Technology and Applications Symposium
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Tagged Sub-optimal code (TSC) is a new codingtechnique presented in this paper to speed up stringmatching over compressed databases on PDAs. TSCis a variable-length sub-optimal code that supportsminimal prefix property. It always determines itscodeword boundary without traversing a tree orlookup table. TSC technique may be beneficial inmany types of applications: speeding up stringmatching over compressed text, speeding decodingprocess, as well as any general-purpose integerrepresentation code. Experimental results show thatTSC is 8.9 times faster than string matching overcompressed text using Huffman encoding, and 3times faster in the decoding process. On the otherhand, the compression ratio of TSC is 6% less thanthat of Huffman encoding. Additionally, TSC is 14times faster than Byte Pair Encoding (BPE)compression process, and achieves betterperformance than searching over compressed textusing BPE scheme on handheld devices.