The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
Compact pat trees
Communications of the ACM
Efficient algorithms for document retrieval problems
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Succinct Representation of Balanced Parentheses and Static Trees
SIAM Journal on Computing
Compressing Relations and Indexes
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Offline Dictionary-Based Compression
DCC '99 Proceedings of the Conference on Data Compression
Journal of the ACM (JACM)
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching
SIAM Journal on Computing
Representing Trees of Higher Degree
Algorithmica
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Space-efficient static trees and graphs
SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Efficient top-k algorithms for fuzzy search in string collections
Proceedings of the First International Workshop on Keyword Search on Structured Data
Efficient type-ahead search on relational data: a TASTIER approach
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Artificial Intelligence: A Modern Approach
Artificial Intelligence: A Modern Approach
Space-Efficient Framework for Top-k String Retrieval Problems
FOCS '09 Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science
Broadword implementation of rank/select queries
WEA'08 Proceedings of the 7th international conference on Experimental algorithms
Fully-functional succinct trees
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Online spelling correction for query completion
Proceedings of the 20th international conference on World wide web
Compressed string dictionaries
SEA'11 Proceedings of the 10th international conference on Experimental algorithms
Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays
SIAM Journal on Computing
Rank-Sensitive data structures
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Universal codeword sets and representations of the integers
IEEE Transactions on Information Theory
Supporting efficient top-k queries in type-ahead search
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Spaces, Trees, and Colors: The algorithmic landscape of document retrieval on sequences
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
Virtually every modern search application, either desktop, web, or mobile, features some kind of query auto-completion. In its basic form, the problem consists in retrieving from a string set a small number of completions, i.e. strings beginning with a given prefix, that have the highest scores according to some static ranking. In this paper, we focus on the case where the string set is so large that compression is needed to fit the data structure in memory. This is a compelling case for web search engines and social networks, where it is necessary to index hundreds of millions of distinct queries to guarantee a reasonable coverage; and for mobile devices, where the amount of memory is limited. We present three different trie-based data structures to address this problem, each one with different space/time/complexity trade-offs. Experiments on large-scale datasets show that it is possible to compress the string sets, including the scores, down to spaces competitive with the gzip'ed data, while supporting efficient retrieval of completions at about a microsecond per completion.