MESSIAH: missing element-conscious SLCA nodes search in XML data

  • Authors:
  • Ba Quan Truong;Sourav S Bhowmick;Curtis Dyreson;Aixin Sun

  • Affiliations:
  • Nanyang Technological University, Singapore, Singapore;Nanyang Technological University, Singapore, Singapore;Utah State University, Logan, UT, USA;Nanyang Technological University, Singapore, Singapore

  • Venue:
  • Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Keyword search for smallest lowest common ancestors (SLCAs) in XML data has been widely accepted as a meaningful way to identify matching nodes where their subtrees contain an input set of keywords. Although SLCA and its variants (e.g.,MLCA) perform admirably in identifying matching nodes, surprisingly, they perform poorly for searches on irregular schemas that have missing elements, that is, (sub)elements that are optional, or appear in some instances of an element type but not all (e.g., a "population" subelement in a "city" element might be optional, appearing when the population is known and absent when the population is unknown). In this paper, we generalize the SLCA search paradigm to support queries involving missing elements. Specifically, we propose a novel property called optionality resilience that specifies the desired behaviors of an XML keyword search (XKS) approach for queries involving missing elements. We present two variants of a novel algorithm called MESSIAH (Missing Element-conSciouS hIgh-quality SLCA searcH), which are optionality resilient to irregular documents. MESSIAH logically transforms an XML document to a minimal full document where all missing elements are represented as empty elements, i.e., the irregular schema is made "regular", and then employs efficient strategies to identify partial and complete full SLCA nodes (SLCA nodes in the full document) from it. Specifically, it generates the same SLCA nodes as any state-of-the-art approach when the query does not involve missing elements but avoids irrelevant results when missing elements are involved. Our experimental study demonstrates the ability of MESSIAH to produce superior quality search results.