Suffix arrays: a new method for on-line string searches
SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
Multidimensional binary search trees used for associative searching
Communications of the ACM
Introduction to Algorithms
ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Search Vox: leveraging multimodal refinement and partial knowledge for mobile voice search
Proceedings of the 21st annual ACM symposium on User interface software and technology
Designing phrase builder: a mobile real-time query expansion interface
Proceedings of the 11th International Conference on Human-Computer Interaction with Mobile Devices and Services
Space-efficient data structures for Top-k completion
Proceedings of the 22nd international conference on World Wide Web
Hi-index | 0.01 |
Suppose we have a large dictionary of strings. Each entry starts with a figure of merit (popularity). We wish to find the k-best matches for a substring, s, in a dictinoary, dict. That is, grep s dict | sort -n | head -k, but we would like to do this in sublinear time. Example applications: (1) web queries with popularities, (2) products with prices and (3) ads with click through rates. This paper proposes a novel index, k-best suffix arrays, based on ideas borrowed from suffix arrays and kdtrees. A standard suffix array sorts the suffixes by a single order (lexicographic) whereas k-best suffix arrays are sorted by two orders (lexicographic and popularity). Lookup time is between log N and sqrt N.