String matching over compressed text on handheld devices

  • Authors:
  • Iehab Abdulaziz Al Rassan;Abdelghani Bellaachia

  • Affiliations:
  • -;-

  • Venue:
  • String matching over compressed text on handheld devices
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The growing demand for storing data and applications on handheld devices increases the need to expand their memory capacities. Accessing and composing e-mails, retrieving web clippings, browsing e-books, and editing Microsoft Word, Excel and PowerPoint-compatible documents on the go, are all examples of needs that must be met. Solutions to memory expansion must be done either on a hardware basis (by adding more memory modules), or on a software basis (by compressing data and searching while data are in compressed form). In this research, two new algorithms are developed and investigated (on the software basis of data compression) to determine how they increase the memory efficiency and capacity of handheld devices. The goal of this research is to free as much memory as possible on handheld devices by using effective and efficient compression schemes while allowing random access and manipulation of data to individual records within the compressed databases. Two new algorithms for string matching over compressed text on handheld devices are presented in this research, Searching over Compressed Text using BPE (SCTB) and Searching over Compressed Text using TSC (SCTT). The SCTB solution uses the Byte Pair Encoding (BPE) compression scheme. It is 6.6 times faster than decompressing the databases followed by a linear search in different sizes of databases. The SCTT searching solution is based on a new Tagged Suboptimal Coding (TSC) technique that is devised to compress data as a general-purpose compression scheme and to speed up string matching over compressed databases on handheld devices. The SCTT method is 9 times faster than string matching over compressed text by using Huffman encoding on a desktop, and achieves better performance compared to SCTB solution in databases consisting of small sized-records. In both methods, about 32% more space has become available in the compressed databases that consist of small-sized records. Results show that SCTB is the recommended solution for rarely updated databases that consist of large-sized records, like e-books; and SCTT is the recommended solution for frequently updated databases, or those that consist of small-sized records. Both SCTB and SCTT methods are faster than decompressing the databases followed by a linear search in all sizes.