Efficient fuzzy full-text type-ahead search

  • Authors:
  • Guoliang Li;Shengyue Ji;Chen Li;Jianhua Feng

  • Affiliations:
  • Department of Computer Science, Tsinghua University, Beijing, China 100084;Department of Computer Science, University of California, Irvine, USA;Department of Computer Science, University of California, Irvine, USA;Department of Computer Science, Tsinghua University, Beijing, China 100084

  • Venue:
  • The VLDB Journal — The International Journal on Very Large Data Bases
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional information systems return answers after a user submits a complete query. Users often feel "left in the dark" when they have limited knowledge about the underlying data and have to use a try-and-see approach for finding information. A recent trend of supporting autocomplete in these systems is a first step toward solving this problem. In this paper, we study a new information-access paradigm, called "type-ahead search" in which the system searches the underlying data "on the fly" as the user types in query keywords. It extends autocomplete interfaces by allowing keywords to appear at different places in the underlying data. This framework allows users to explore data as they type, even in the presence of minor errors. We study research challenges in this framework for large amounts of data. Since each keystroke of the user could invoke a query on the backend, we need efficient algorithms to process each query within milliseconds. We develop various incremental-search algorithms for both single-keyword queries and multi-keyword queries, using previously computed and cached results in order to achieve a high interactive speed. We develop novel techniques to support fuzzy search by allowing mismatches between query keywords and answers. We have deployed several real prototypes using these techniques. One of them has been deployed to support type-ahead search on the UC Irvine people directory, which has been used regularly and well received by users due to its friendly interface and high efficiency.