Linearized Suffix Tree: an Efficient Index Data Structure with the Capabilities of Suffix Trees and Suffix Arrays

Authors:
Dong Kyue Kim;Minhwan Kim;Heejin Park
Affiliations:
Hanyang University, Division of Electronics and Computer Engineering, 133-791, Seoul, South Korea;Pusan National University, Division of Computer Engineering, 609-735, Busan, South Korea;Hanyang University, College of Information and Communications, 133-791, Seoul, South Korea
Venue:
Algorithmica
Year:
2008

Citing 0
Cited 2

Maximal phrases based analysis for prototyping online discussion forums postings

AdaptLRTtoND '09 Proceedings of the Workshop on Adaptation of Language Resources and Technology to New Domains
A new compressed suffix tree supporting fast search and its construction algorithm using optimal working space

CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching

Quantified Score

Hi-index	0.00

Visualization

Abstract

Suffix trees and suffix arrays are fundamental full-text index data structures to solve problems occurring in string processing. Since suffix trees and suffix arrays have different capabilities, some problems are solved more efficiently using suffix trees and others are solved more efficiently using suffix arrays. We consider efficient index data structures with the capabilities of both suffix trees and suffix arrays without requiring much space. When the size of an alphabet is small, enhanced suffix arrays are such index data structures. However, when the size of an alphabet is large, enhanced suffix arrays lose the power of suffix trees. Pattern searching in an enhanced suffix array takes O(m|Σ|) time while pattern searching in a suffix tree takes O(mlog |Σ|) time where m is the length of a pattern and Σ is an alphabet. In this paper, we present linearized suffix trees which are efficient index data structures with the capabilities of both suffix trees and suffix arrays even when the size of an alphabet is large. A linearized suffix tree has all the functionalities of the enhanced suffix array and supports the pattern search in O(mlog |Σ|) time. In a different point of view, it can be considered a practical implementation of the suffix tree supporting O(mlog |Σ|)-time pattern search. In addition, we also present two efficient algorithms for computing suffix links on the enhanced suffix array and the linearized suffix tree. These are the first algorithms that run in O(n) time without using the range minima query. Our experimental results show that our algorithms are faster than the previous algorithms.