K-best suffix arrays

Authors:
Kenneth Church;Bo Thiesson;Robert Ragno
Affiliations:
Microsoft, One Microsoft Way, Redmond, WA;Microsoft, One Microsoft Way, Redmond, WA;Microsoft, One Microsoft Way, Redmond, WA
Venue:
NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Year:
2007

Citing 4
Cited 4

Suffix arrays: a new method for on-line string searches

SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
Multidimensional binary search trees used for associative searching

Communications of the ACM
Introduction to Algorithms

Introduction to Algorithms
The wild thing!

ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions

The wild thing goes local

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Search Vox: leveraging multimodal refinement and partial knowledge for mobile voice search

Proceedings of the 21st annual ACM symposium on User interface software and technology
Designing phrase builder: a mobile real-time query expansion interface

Proceedings of the 11th International Conference on Human-Computer Interaction with Mobile Devices and Services
Space-efficient data structures for Top-k completion

Proceedings of the 22nd international conference on World Wide Web

Quantified Score

Hi-index	0.01

Visualization

Abstract

Suppose we have a large dictionary of strings. Each entry starts with a figure of merit (popularity). We wish to find the k-best matches for a substring, s, in a dictinoary, dict. That is, grep s dict | sort -n | head -k, but we would like to do this in sublinear time. Example applications: (1) web queries with popularities, (2) products with prices and (3) ads with click through rates. This paper proposes a novel index, k-best suffix arrays, based on ideas borrowed from suffix arrays and kdtrees. A standard suffix array sorts the suffixes by a single order (lexicographic) whereas k-best suffix arrays are sorted by two orders (lexicographic and popularity). Lookup time is between log N and sqrt N.