Context-Sensitive Grammar Transform: Compression and Pattern Matching

  • Authors:
  • Shirou Maruyama;Yohei Tanaka;Hiroshi Sakamoto;Masayuki Takeda

  • Affiliations:
  • Department of Informatics, Kyushu University, Fukuoka, Japan 819-0395;Graduate School of Computer Science and Systems Engineering, Kyushu Institute of Technology, Iizuka, Japan 820-8502;Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, Iizuka, Japan 820-8502;Department of Informatics, Kyushu University, Fukuoka, Japan 819-0395

  • Venue:
  • SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

A framework of context-sensitive grammar transform is proposed. A greedy compression algorithm with the transform model is presented as well as a Knuth-Morris-Pratt (KMP)-type compressed pattern matching (CPM) algorithm. The compression performance is a match for gzip and Re-Pair. The search speed of our CPM algorithm is almost twice faster than the KMP type CPM algorithm on Byte-Pair-Encoding by Shibata et al. (2000), and in the case of short patterns, faster than the Boyer-Moore-Horspool algorithm with the stopper encoding by Rautio et al. (2002), which is regarded as one of the best combinations that allows a practically fast search.