On the effectiveness of flexible querying heuristics for XML data

Authors:
Zografoula Vagena;Latha Colby;Fatma Özcan;Andrey Balmin;Quanzhong Li
Affiliations:
IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA
Venue:
XSym'07 Proceedings of the 5th international conference on Database and XML Technologies
Year:
2007

Citing 15
Cited 3

The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Generating Relations from XML Documents

ICDT '03 Proceedings of the 9th International Conference on Database Theory
Querying XML Documents Made Easy: Nearest Concept Queries

Proceedings of the 17th International Conference on Data Engineering
XRANK: ranked keyword search over XML documents

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
FleXPath: flexible structure and full-text querying for XML

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Efficient keyword search for smallest LCAs in XML databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
The SphereSearch engine for unified ranked retrieval of heterogeneous XML and web documents

VLDB '05 Proceedings of the 31st international conference on Very large data bases
An efficient and versatile query engine for TopX search

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Interconnection semantics for keyword search in XML

Proceedings of the 14th ACM international conference on Information and knowledge management
Symmetrically exploiting XML

Proceedings of the 15th international conference on World Wide Web
Multiway SLCA-based keyword search in XML data

Proceedings of the 16th international conference on World Wide Web
BLINKS: ranked keyword searches on graphs

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Identifying meaningful return information for XML keyword search

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
XSEarch: a semantic search engine for XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Schema-free XQuery

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Relational-style XML query

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SEDA: a system for search, exploration, discovery, and analysis of XML Data

Proceedings of the VLDB Endowment
WikiAnalytics: disambiguation of keyword search results on highly heterogeneous structured data

Procceedings of the 13th International Workshop on the Web and Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ability to perform effective XML data retrieval in the absence of schema knowledge has recently received considerable attention. The majority of relevant proposals employs heuristics that identify groups of meaningfully related nodes using information extracted from the input data. These heuristics are employed to effectively prune the search space of all possible node combinations and their popularity is evident by the large number of such heuristics and the systems that use them. However, a comprehensive study detailing the relative merits of these heuristics has not been performed thus far. One of the challenges in performing this study is the fact that these techniques have been proposed within different and not directly comparable contexts. In this paper, we attempt to fill this gap. In particular, we first abstract the common selection problem that is tackled by the relatedness heuristics and show how each heuristic addresses this problem. We then identify data categories where the assumptions made by each heuristic are valid and draw insights on their possible effectiveness. Our findings can help systems implementors understand the strengths and weaknesses of each heuristic and provide simple guidelines for the applicability of each one.