Position heaps: A simple and dynamic text indexing data structure

Authors:
Andrzej Ehrenfeucht;Ross M. McConnell;Nissa Osheim;Sung-Whan Woo
Affiliations:
Dept. of Computer Science, 430 UCB, University of Colorado at Boulder, Boulder, CO 80309-0430, USA;Dept. of Computer Science, Colorado State University, Fort Collins, CO 80523-1873, USA;Dept. of Computer Science, Colorado State University, Fort Collins, CO 80523-1873, USA;Dept. of Computer Science, Colorado State University, Fort Collins, CO 80523-1873, USA
Venue:
Journal of Discrete Algorithms
Year:
2011

Citing 10
Cited 3

Data structures and network algorithms

Data structures and network algorithms
Complete inverted files for efficient text retrieval and analysis

Journal of the ACM (JACM)
Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
On updating suffix tree labels

Theoretical Computer Science
File structures using hashing functions

Communications of the ACM
Introduction to Algorithms

Introduction to Algorithms
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Contracted Suffix Trees: A Simple and Dynamic Text Indexing Data Structure

CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Dynamic extended suffix arrays

Journal of Discrete Algorithms

On-line construction of position heaps

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
The position heap of a trie

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
On-line construction of position heaps

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem of finding the locations of all instances of a string P in a text T, where preprocessing of T is allowed in order to facilitate the queries. Previous data structures for this problem include the suffix tree, the suffix array, and the compact DAWG. We modify a data structure called a sequence tree, which was proposed by Coffman and Eve (1970) [3] for hashing, and adapt it to the new problem. We can then produce a list of k occurrences of any string P in T in O(@?P@?+k) time. Because of properties shared by suffixes of a text that are not shared by arbitrary hash keys, we can build the structure in O(@?T@?) time, which is much faster than Coffman and Eve's algorithm. These bounds are as good as those for the suffix tree, suffix array, and the compact DAWG. The advantages are the elementary nature of some of the algorithms for constructing and using the data structure and the asymptotic bounds we can give for updating the data structure when the text is edited.