On-Line linear-time construction of word suffix trees

  • Authors:
  • Shunsuke Inenaga;Masayuki Takeda

  • Affiliations:
  • Japan Society for the Promotion of Science;Department of Informatics, Kyushu University, Fukuoka, Japan

  • Venue:
  • CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
  • Year:
  • 2006

Quantified Score

Hi-index 0.03

Visualization

Abstract

Suffix trees are the key data structure for text string matching, and are used in wide application areas such as bioinformatics and data compression. Sparse suffix trees are kind of suffix trees that represent only a subset of suffixes of the input string. In this paper we study word suffix trees, which are one variation of sparse suffix trees. Let D be a dictionary of words and w be a string in D+, namely, w is a sequence w1 ⋯wk of k words in D. The word suffix tree of w w.r.t. D is a path-compressed trie that represents only the k suffixes in the form of wi ⋯wk. A typical example of its application is word- and phrase-level search on natural language documents. Andersson et al. proposed an algorithm to build word suffix trees in O(n) expected time with O(k) space. In this paper we present a new word suffix tree construction algorithm with O(n) running time and O(k) space in the worst cases. Our algorithm is on-line, which means that it can sequentially process the characters in the input, each by each, from left to right.