Structural consistency: enabling XML keyword search to eliminate spurious results consistently

Authors:
Ki-Hoon Lee;Kyu-Young Whang;Wook-Shin Han;Min-Soo Kim
Affiliations:
Department of Computer Science, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea;Department of Computer Science, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea;Department of Computer Engineering, Kyungpook National University, Daegu, South Korea;Department of Computer Science, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2010

Citing 37
Cited 3

Modern Information Retrieval

Modern Information Retrieval
Storing and querying ordered XML using a relational database system

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Holistic twig joins: optimal XML pattern matching

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Querying XML Documents Made Easy: Nearest Concept Queries

Proceedings of the 17th International Conference on Data Engineering
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
XRANK: ranked keyword search over XML documents

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Keyword Searching and Browsing in Databases using BANKS

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Odysseus: A High-Performance ORDBMS Tightly-Coupled with IR Features

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Efficient keyword search for smallest LCAs in XML databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Structure and content scoring for XML

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Keyword Proximity Search in XML Trees

IEEE Transactions on Knowledge and Data Engineering
Effective keyword search in relational databases

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Flexible and efficient XML search with complex full-text predicates

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
An algebraic query model for effective and efficient retrieval of XML fragments

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Multiway SLCA-based keyword search in XML data

Proceedings of the 16th international conference on World Wide Web
Spark: top-k keyword query in relational databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Identifying meaningful return information for XML keyword search

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Discover: keyword search in relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
XSEarch: a semantic search engine for XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient IR-style keyword search over relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Effective keyword search for valuable lcas over xml documents

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Combination of evidences in relevance feedback for xml retrieval

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Inferring XML schema definitions from XML data

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Querying complex structured databases

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Path Summaries and Path Partitioning in Modern XML Databases

World Wide Web
Efficient LCA based keyword search in XML data

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Efficient updates in dynamic XML data: from binary string to quaternary string

The VLDB Journal — The International Journal on Very Large Data Bases
Enabling Schema-Free XQuery with meaningful query focus

The VLDB Journal — The International Journal on Very Large Data Bases
Query biased snippet generation in XML search

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SIMP: Efficient XML Structural Index for Multiple Query Processing

WAIM '08 Proceedings of the 2008 The Ninth International Conference on Web-Age Information Management
Reasoning and identifying relevant matches for XML keyword search

Proceedings of the VLDB Endowment
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Effective XML Keyword Search with Relevance Oriented Ranking

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Feedback-Driven structural query expansion for ranked retrieval of XML data

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Efficient evaluation of partial match queries for XML documents using information retrieval techniques

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Efficient processing of multiple XML twig queries

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications

XMin: Minimizing Tree Pattern Queries with Minimality Guarantee

World Wide Web
Supporting range queries in XML keyword search

Proceedings of the Joint EDBT/ICDT 2013 Workshops
MESSIAH: missing element-conscious SLCA nodes search in XML data

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

XML keyword search is a user-friendly way to query XML data using only keywords. In XML keyword search, to achieve high precision without sacrificing recall, it is important to remove spurious results not intended by the user. Efforts to eliminate spurious results have enjoyed some success using the concepts of LCA or its variants, SLCA and MLCA. However, existing methods still could find many spurious results. The fundamental cause for the occurrence of spurious results is that the existing methods try to eliminate spurious results locally without global examination of all the query results and, accordingly, some spurious results are not consistently eliminated. In this paper, we propose a novel keyword search method that removes spurious results consistently by exploiting the new concept of structural consistency. We define structural consistency as a property that is preserved if there is no query result having an ancestor-descendant relationship at the schema level with any other query results. A naive solution to obtain structural consistency would be to compute all the LCAs (or variants) and then to remove spurious results according to structural consistency. Obviously, this approach would always be slower than existing LCA-based ones. To speed up structural consistency checking, we must be able to examine the query results at the schema level without generating all the LCAs. However, this is a challenging problem since the schema-level query results do not homomorphically map to the instance-level query results, causing serious false dismissal. We present a comprehensive and practical solution to this problem and formally prove that this solution preserves structural consistency at the schema level without incurring false dismissal. We also propose a relevance-feedback-based solution for the problem where our method has low recall, which occurs when it is not the user's intention to find more specific results. This solution has been prototyped in a full-fledged object-relational DBMS Odysseus developed at KAIST. Experimental results using real and synthetic data sets show that, compared with the state-of-the-art methods, our solution significantly (1) improves precision while providing comparable recall for most queries and (2) enhances the query performance by removing spurious results early.