Efficient error-tolerant query autocompletion

Authors:
Chuan Xiao;Jianbin Qin;Wei Wang;Yoshiharu Ishikawa;Koji Tsuda;Kunihiko Sadakane
Affiliations:
Nagoya University, Japan;UNSW, Australia;UNSW, Australia;Nagoya University, Japan;AIST and JST ERATO, Japan;NII, Japan
Venue:
Proceedings of the VLDB Endowment
Year:
2013

Citing 32
Cited 0

Algorithms for approximate string matching

Information and Control
An Efficient Digital Search Algorithm by Using a Double-Array Structure

IEEE Transactions on Software Engineering
The Reactive Keyboard: A Predictive Typing Aid

Computer
The String-to-String Correction Problem

Journal of the ACM (JACM)
A hash code method for detecting and correcting spelling errors

Communications of the ACM
On supporting containment queries in relational database management systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Efficient algorithms for document retrieval problems

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Dwarf: shrinking the PetaCube

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
The Design and Analysis of Computer Algorithms

The Design and Analysis of Computer Algorithms
Approximate String Joins in a Database (Almost) for Free

Proceedings of the 27th International Conference on Very Large Data Bases
SOCQET: semantic OLAP with compressed cube and summarization

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Dictionary matching and indexing with errors and don't cares

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Sentence completion

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Type less, find more: fast autocompletion search with a succinct index

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Improving search engines by query clustering

Journal of the American Society for Information Science and Technology
Effective phrase prediction

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Introduction to Information Retrieval

Introduction to Information Retrieval
Efficient interactive fuzzy keyword search

Proceedings of the 18th international conference on World wide web
Web Query Recommendation via Sequential Query Prediction

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Extending autocompletion to tolerate errors

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Efficient approximate entity extraction with edit distance constraints

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Large scale query log analysis of re-finding

Proceedings of the third ACM international conference on Web search and data mining
Comparison of construction algorithms for minimal, acyclic, deterministic, finite-state automata from sets of strings

CIAA'02 Proceedings of the 7th international conference on Implementation and application of automata
Clustering query refinements by user intent

Proceedings of the 19th international conference on World wide web
Fast index for approximate string matching

Journal of Discrete Algorithms
Indexing methods for approximate dictionary searching: Comparative analysis

Journal of Experimental Algorithmics (JEA)
Context-sensitive query auto-completion

Proceedings of the 20th international conference on World wide web
Online spelling correction for query completion

Proceedings of the 20th international conference on World wide web
Efficient exact edit similarity query processing with the asymmetric signature scheme

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient fuzzy full-text type-ahead search

The VLDB Journal — The International Journal on Very Large Data Bases
Pass-join: a partition-based method for similarity joins

Proceedings of the VLDB Endowment
An Efficient Trie-based Method for Approximate Entity Extraction with Edit-Distance Constraints

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Query autocompletion is an important feature saving users many keystrokes from typing the entire query. In this paper we study the problem of query autocompletion that tolerates errors in users' input using edit distance constraints. Previous approaches index data strings in a trie, and continuously maintain all the prefixes of data strings whose edit distance from the query are within the threshold. The major inherent problem is that the number of such prefixes is huge for the first few characters of the query and is exponential in the alphabet size. This results in slow query response even if the entire query approximately matches only few prefixes. In this paper, we propose a novel neighborhood generation-based algorithm, IncNGTrie, which can achieve up to two orders of magnitude speedup over existing methods for the error-tolerant query autocompletion problem. Our proposed algorithm only maintains a small set of active nodes, thus saving both space and time to process the query. We also study efficient duplicate removal which is a core problem in fetching query answers. In addition, we propose optimization techniques to reduce our index size, as well as discussions on several extensions to our method. The efficiency of our method is demonstrated against existing methods through extensive experiments on real datasets.