Dynamic faceted search for discovery-driven analysis

  • Authors:
  • Debabrata Dash;Jun Rao;Nimrod Megiddo;Anastasia Ailamaki;Guy Lohman

  • Affiliations:
  • Carnegie Mellon University, Pittsburgh, PA, USA;IBM Almaden Researche Center, San Jose, CA, USA;IBM Almaden Research Center, San Jose, CA, USA;Carnegie Mellon University, Pittsburgh, PA, USA and Ecole Polytechnique Fédérale de Lausanne;IBM Almaden Research Center, San Jose, CA, USA

  • Venue:
  • Proceedings of the 17th ACM conference on Information and knowledge management
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a dynamic faceted search system for discovery-driven analysis on data with both textual content and structured attributes. From a keyword query, we want to dynamically select a small set of "interesting" attributes and present aggregates on them to a user. Similar to work in OLAP exploration, we define "interestingness" as how surprising an aggregated value is, based on a given expectation. We make two new contributions by proposing a novel "navigational" expectation that's particularly useful in the context of faceted search, and a novel interestingness measure through judicious application of p-values. Through a user survey, we find the new expectation and interestingness metric quite effective. We develop an efficient dynamic faceted search system by improving a popular open source engine, Solr. Our system exploits compressed bitmaps for caching the posting lists in an inverted index, and a novel directory structure called a bitset tree for fast bitset intersection. We conduct a comprehensive experimental study on large real data sets and show that our engine performs 2 to 3 times faster than Solr.