The string B-tree: a new data structure for string search in external memory and its applications

  • Authors:
  • Paolo Ferragina;Roberto Grossi

  • Affiliations:
  • Univ. di Pisa, Pisa, Italy;Univ. di Firenze, Florence, Italy

  • Venue:
  • Journal of the ACM (JACM)
  • Year:
  • 1999

Quantified Score

Hi-index 0.01

Visualization

Abstract

We introduce a new text-indexing data structure, the String B-Tree, that can be seen as a link between some traditional external-memory and string-matching data structures. In a short phrase, it is a combination of B-trees and Patricia tries for internal-node indices that is made more effective by adding extra pointers to speed up search and update operations. Consequently, the String B-Tree overcomes the theoretical limitations of inverted files, B-trees, prefix B-trees, suffix arrays, compacted tries and suffix trees. String B-trees have the same worst-case performance as B-trees but they manage unbounded-length strings and perform much more powerful search operations such as the ones supported by suffix trees. String B-trees are also effective in main memory (RAM model) because they improve the online suffix tree search on a dynamic set of strings. They also can be successfully applied to database indexing and software duplication.