Fast prefix search in little space, with applications

Authors:
Djamal Belazzougui;Paolo Boldi;Rasmus Pagh;Sebastiano Vigna
Affiliations:
Université Paris Diderot, Paris 7, France;Dipartimento di Scienze dell'Informazione, Università degli Studi di Milano, Italy;IT University of Copenhagen, Denmark;Dipartimento di Scienze dell'Informazione, Università degli Studi di Milano, Italy
Venue:
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part I
Year:
2010

Citing 14
Cited 4

The string B-tree: a new data structure for string search in external memory and its applications

Journal of the ACM (JACM)
Efficient Storage and Retrieval by Content and Address of Static Files

Journal of the ACM (JACM)
Optimal static range reporting in one dimension

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Polynomial Hash Functions Are Reliable (Extended Abstract)

ICALP '92 Proceedings of the 19th International Colloquium on Automata, Languages and Programming
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Cache-oblivious string dictionaries

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Cache-oblivious string B-trees

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Randomization does not help searching predecessors

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Compressed data structures: Dictionaries and data-aware measures

Theoretical Computer Science
On searching compressed string collections cache-obliviously

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Space-efficient static trees and graphs

SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Succinct Data Structures for Retrieval and Approximate Membership (Extended Abstract)

ICALP '08 Proceedings of the 35th international colloquium on Automata, Languages and Programming, Part I
Monotone minimal perfect hashing: searching a sorted table with O(1) accesses

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms

Space-efficient substring occurrence estimation

Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On the weak prefix-search problem

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
I/O-efficient data structures for colored range and prefix reporting

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
On the weak prefix-search problem

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

A prefix search returns the strings out of a given collection S that start with a given prefix. Traditionally, prefix search is solved by data structures that are also dictionaries, that is, they actually contain the strings in S. For very large collections stored in slow-access memory, we propose extremely compact data structures that solve weak prefix searches--they return the correct result only if some string in S starts with the given prefix. Our data structures for weak prefix search use O(|S| log l) bits in the worst case, where l is the average string length, as opposed to O(|S|l) bits for a dictionary. We show a lower bound implying that this space usage is optimal.