LCA-based selection for XML document collections

Authors:
Georgia Koloniari;Evaggelia Pitoura
Affiliations:
University of Ioannina, Ioannina, Greece;University of Ioannina, Ioannina, Greece
Venue:
Proceedings of the 19th international conference on World wide web
Year:
2010

Citing 22
Cited 3

Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
GlOSS: text-source discovery over the Internet

ACM Transactions on Database Systems (TODS)
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
XIRQL: a query language for information retrieval in XML documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
StatiX: making XML count

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications

Proceedings of the 27th International Conference on Very Large Data Bases
XRANK: ranked keyword search over XML documents

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Selectivity Estimation for XML Twigs

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Efficient keyword search for smallest LCAs in XML databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Multiway SLCA-based keyword search in XML data

Proceedings of the 16th international conference on World Wide Web
Effective keyword-based selection of relational databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Identifying meaningful return information for XML keyword search

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
XSEarch: a semantic search engine for XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Schema-free XQuery

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Bloom histogram: path selectivity estimation for XML data with updates

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Effective keyword search for valuable lcas over xml documents

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Efficient LCA based keyword search in XML data

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
A graph method for keyword-based selection of the top-K databases

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Reasoning and identifying relevant matches for XML keyword search

Proceedings of the VLDB Endowment
Answering Keyword Queries on XML Using Materialized Views

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
XML processing in DHT networks

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Database selection and result merging in P2P web search

DBISP2P'05/06 Proceedings of the 2005/2006 international conference on Databases, information systems, and peer-to-peer computing

Improving the performance of identifying contributors for XML keyword search

ACM SIGMOD Record
Efficiently identifying contributors for XML keyword search

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Spelling suggestion for XML keyword search based on pairwise keyword summaries

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we address the problem of database selection for XML document collections, that is, given a set of collections and a user query, how to rank the collections based on their goodness to the query. Goodness is determined by the relevance of the documents in the collection to the query. We consider keyword queries and support Lowest Common Ancestor (LCA) semantics for defining query results, where the relevance of each document to a query is determined by properties of the LCA of those nodes in the XML document that contain the query keywords. To avoid evaluating queries against each document in a collection, we propose maintaining in a preprocessing phase, information about the LCAs of all pairs of keywords in a document and use it to approximate the properties of the LCA-based results of a query. To improve storage and processing efficiency, we use appropriate summaries of the LCA information based on Bloom filters. We address both a boolean and a weighted version of the database selection problem. Our experimental results show that our approach incurs low errors in the estimation of the goodness of a collection and provides rankings that are very close to the actual ones.