Feedback-driven result ranking and query refinement for exploring semi-structured data collections

  • Authors:
  • Huiping Cao;Yan Qi;K. Selçuk Candan;Maria Luisa Sapino

  • Affiliations:
  • Arizona State Univ., Tempe, AZ;Arizona State Univ., Tempe, AZ;Arizona State Univ., Tempe, AZ;Univ. di Torino, Torino, Italy

  • Venue:
  • Proceedings of the 13th International Conference on Extending Database Technology
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Feedback process has been used extensively in document-centric applications, such as text retrieval and multimedia retrieval. Recently, there have been efforts to apply feedback to semi-structured XML document collections as well. In this paper, we note that feedback can also be an effective tool for exploring (through result ranking and query refinement) large semi-structured data collections. In particular, in large scale data sharing and curation environments, where the user may not know the structure of the data, queries may initially be overly vague. Given a path query and a set of results identified by the system to this query over the data, we consider two types of feedback: Soft feedback captures the user's preference for some features over the others. Hard feedback, on the other hand, expresses users' assertions regarding whether certain features should be further enforced or, in contrast, are to be avoided. Both soft and hard feedback can be "positive" or "negative". For soft feedback, we develop a probabilistic feature significance measure and describe how to use this for ranking results in the presence of dependencies between the path features. To deal with the hard feedback efficiently (i.e., fast enough for interactive exploration), we present finite automata based query refinement solutions. In particular, we present a novel LazyDFA+ algorithm for managing hard feedback. We also describe optimizations that leverage the inherently iterative nature of the feedback process. We bring together these techniques in AXP, a system for adaptive and exploratory path retrieval. The experimental results show the effectiveness of the proposed techniques.