Handbook of theoretical computer science (vol. A)
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The string B-tree: a new data structure for string search in external memory and its applications
Journal of the ACM (JACM)
Fast algorithms for sorting and searching strings
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
ACM Transactions on Database Systems (TODS)
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Structuring labeled trees for optimal succinctness, and beyond
FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Representing Trees of Higher Degree
Algorithmica
Cache-oblivious string dictionaries
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Compressing and searching XML data via two zips
Proceedings of the 15th international conference on World Wide Web
Cache-oblivious string B-trees
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A data structure for a sequence of string accesses in external memory
ACM Transactions on Algorithms (TALG)
ACM Computing Surveys (CSUR)
Detecting near-duplicates for web crawling
Proceedings of the 16th international conference on World Wide Web
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Ultra-succinct representation of ordered trees
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Space-efficient static trees and graphs
SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
On the size of succinct indices
ESA'07 Proceedings of the 15th annual European conference on Algorithms
Optimal self-adjusting trees for dynamic string data in secondary storage
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Succinct ordinal trees based on tree covering
ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming
Algorithms and data structures for external memory
Foundations and Trends® in Theoretical Computer Science
Brighthouse: an analytic data warehouse for ad-hoc queries
Proceedings of the VLDB Endowment
Note: On compact representations of All-Pairs-Shortest-Path-Distance matrices
Theoretical Computer Science
Compression, indexing, and retrieval for massive string data
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Data structures: time, I/Os, entropy, joules!
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
Fast prefix search in little space, with applications
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part I
E=I+T: The internal extent formula for compacted tries
Information Processing Letters
On the weak prefix-search problem
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
The wavelet trie: maintaining an indexed sequence of strings in compressed space
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Faster compressed dictionary matching
Theoretical Computer Science
On the weak prefix-search problem
Theoretical Computer Science
Hi-index | 0.00 |
Current data structures for searching large string collections either fail to achieve minimum space or cause too many cache misses. In this paper we discuss some edge linearizations of the classic trie data structure that are simultaneously cache-friendly and compressed. We provide new insights on front coding [24], introduce other novel linearizations, and study how close their space occupancy is to the information-theoretic minimum. The moral is that they are not just heuristics. Our second contribution is a novel dictionary encoding scheme that builds upon such linearizations and achieves nearly optimal space, offers competitive I/O-search time, and is also conscious of the query distribution. Finally, we combine those data structures with cache-oblivious tries [2, 5] and obtain a succinct variant whose space is close to the information-theoretic minimum.