Memory-Aware BWT by segmenting sequences to support subsequence search

  • Authors:
  • Jiaying Wang;Xiaochun Yang;Bin Wang;Huaijie Zhu

  • Affiliations:
  • College of Information Science and Engineering, Northeastern University, Liaoning, China;College of Information Science and Engineering, Northeastern University, Liaoning, China;College of Information Science and Engineering, Northeastern University, Liaoning, China;College of Information Science and Engineering, Northeastern University, Liaoning, China

  • Venue:
  • APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Nowadays, Burrows-Wheeler transform (BWT) has been receiving significant attentions in academia for addressing subsequence matching problems. Although BWT is a typical technique to transform a sequence into a new sequence that is "easy to compress", it can also be extended as a kind of full text index techniques. Traditional BWT requires nlogn+nlogσ bits to build index for a sequence with n characters, where σ is size of the alphabet. Building BWT index for a long sequence on PCs with limited memory is a great challenge. In order to solve the problem, we propose a novel variation of BWT index named S-BWT, which separates the source sequence into segments. It can reduce the memory cost to n(logσ+logn−logk )/k bits, where k is the number of segments. However, querying on each segment separately using the existing approaches has to undertake the risk of losing some significant results. In this paper, we propose two query methods based on S-BWT and guarantee to find all subsequence occurrences. Our methods can not only require small memory space, but also are faster than the state-of-art BWT backward search method for long sequence.