Cache-aware parallel approximate matching and join algorithms using BWT

  • Authors:
  • Jiaying Wang;Xiaochun Yang;Bin Wang

  • Affiliations:
  • Northeastern University, Liaoning, China;Northeastern University, Liaoning, China;Northeastern University, Liaoning, China

  • Venue:
  • Proceedings of the Joint EDBT/ICDT 2013 Workshops
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Nowadays, approximate string search and join, as essential operations in data integration and cleaning, has attracted significant attentions in academic. In this paper, we study string similarity search and join with edit distance constraints. Although multicore machines have become the mainstream computer architecture, most existing methods only work on a uniprocessor. To address this problem, we propose a novel parallel framework using BWT. We also devise efficient technique to utilize cache to further speed up the performance. Our method can solve similar search and join efficiently and generally. We conducted a comprehensive experimental study of our method to demonstrate the efficiency.