Retrieving valid matches for XML keyword search

Authors:
Lingbo Kong;Rémi Gilleron;Aurélien Lemay
Affiliations:
Mostrare, Villeneuve d'Ascq, Lille, France;Mostrare, Villeneuve d'Ascq, Lille, France;Mostrare, Villeneuve d'Ascq, Lille, France
Venue:
Proceedings of the 2009 ACM symposium on Applied Computing
Year:
2009

Citing 11
Cited 0

Storing and querying ordered XML using a relational database system

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
XRANK: ranked keyword search over XML documents

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient keyword search for smallest LCAs in XML databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Keyword Proximity Search in XML Trees

IEEE Transactions on Knowledge and Data Engineering
Multiway SLCA-based keyword search in XML data

Proceedings of the 16th international conference on World Wide Web
Identifying meaningful return information for XML keyword search

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
XSEarch: a semantic search engine for XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Schema-free XQuery

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Effective keyword search for valuable lcas over xml documents

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Efficient LCA based keyword search in XML data

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Reasoning and identifying relevant matches for XML keyword search

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Adapting keyword search to XML data has been attractive recently, generalized as XML Keyword Search (XKS). Its fundamental task is to retrieve meaningful and concise result for the given keyword query, and [1] is the latest work which returns the fragments rooted at the SLCA (Smallest LCA - Lowest Common Ancestor) nodes. To guarantee the fragments only containing meaningful nodes, [1] proposed a contributor-based filtering mechanism in its MaxMatch algorithm. However, the filtering mechanism is not sufficient. It will commit the false positive problem (discarding interesting nodes) and the redundancy problem (keeping uninteresting nodes). In this paper, we propose a new filtering mechanism to overcome those two problems. The fundamental concept is valid contributor. A child v is a valid contributor to its parent u, if (1) v's label is unique among all u's children; or (2) for the siblings with same label as v, v's content is not covered by any of them. Our new filtering mechanism is: all the nodes in each retrieved fragment should be valid contributors to their parents. By doing so, it not only satisfies the axiomatic properties proposed by [1], but also ensures the filtered fragment more meaningful and concise. We implement our proposal in ValidMatch, and compare ValidMatch with MaxMatch on real and synthetic XML data. The result verifies our claims, and shows the effectiveness of our valid-contributor-based filtering mechanism.