Beyond pages: supporting efficient, scalable entity search with dual-inversion index

  • Authors:
  • Tao Cheng;Kevin Chen-Chuan Chang

  • Affiliations:
  • University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL

  • Venue:
  • Proceedings of the 13th International Conference on Extending Database Technology
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Entity search, a significant departure from page-based retrieval, finds data, i.e., entities, embedded in documents directly and holistically across the whole collection. This paper aims at distilling and abstracting the essential computation requirements of entity search. From the dual views of reasoning--entity as input and entity as output, we propose a dual-inversion framework, with two indexing and partition schemes, towards efficient and scalable query processing. We systematically evaluate our framework using a prototype over a 3TB real Web corpus with 150M pages and over 20 entity types extracted. Our experiments in two concrete application settings show our techniques of on average, 2 to 4 orders of magnitude speed-up, over the keyword-based baseline, with reasonable space overhead.