Fast text searching for regular expressions or automaton searching on tries
Journal of the ACM (JACM)
Fast algorithms for sorting and searching strings
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
An analysis of the Burrows—Wheeler transform
Journal of the ACM (JACM)
Modern Information Retrieval
Two-dimensional substring indexing
Journal of Computer and System Sciences - Special issu on PODS 2001
UbiCrawler: a scalable fully distributed web crawler
Software—Practice & Experience
Journal of the ACM (JACM)
Boosting textual compression in optimal linear time
Journal of the ACM (JACM)
ACM Computing Surveys (CSUR)
Succinct data structures for flexible text retrieval systems
Journal of Discrete Algorithms
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
Compressed indexes for dynamic text collections
ACM Transactions on Algorithms (TALG)
A taxonomy of suffix array construction algorithms
ACM Computing Surveys (CSUR)
Succinct indexes for strings, binary relations and multi-labeled trees
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Dynamic entropy-compressed sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
Introduction to Information Retrieval
Introduction to Information Retrieval
Improved dynamic rank-select entropy-bound structures
LATIN'08 Proceedings of the 8th Latin American conference on Theoretical informatics
CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Compressed string dictionaries
SEA'11 Proceedings of the 10th international conference on Experimental algorithms
Succinct indexes for circular patterns
ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Compression of RDF dictionaries
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Querying RDF dictionaries in compressed space
ACM SIGAPP Applied Computing Review
Compressed string dictionary look-up with edit distance one
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Hi-index | 0.01 |
The Permuterm index [Garfield 1976] is a time-efficient and elegant solution to the string dictionary problem in which pattern queries may possibly include one wild-card symbol (called Tolerant Retrieval problem). Unfortunately the Permuterm index is space inefficient because it quadruples the dictionary size. In this article we propose the Compressed Permuterm Index which solves the Tolerant Retrieval problem in time proportional to the length of the searched pattern, and space close to the kth order empirical entropy of the indexed dictionary. We also design a dynamic version of this index that allows to efficiently manage insertion in, and deletion from, the dictionary of individual strings. The result is based on a simple variant of the Burrows-Wheeler Transform, defined on a dictionary of strings of variable length, that allows to efficiently solve the Tolerant Retrieval problem via known (dynamic) compressed indexes [Navarro and Mäkinen 2007]. We will complement our theoretical study with a significant set of experiments that show that the Compressed Permuterm Index supports fast queries within a space occupancy that is close to the one achievable by compressing the string dictionary via gzip or bzip. This improves known approaches based on Front-Coding [Witten et al. 1999] by more than 50% in absolute space occupancy, still guaranteeing comparable query time.