Top-K data source selection for keyword queries over multiple XML data sources

Authors:
Khanh Nguyen;Jinli Cao
Affiliations:
La Trobe University, Australia;La Trobe University, Australia
Venue:
Journal of Information Science
Year:
2012

Citing 32
Cited 1

Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Nonparametric methods for quantitative analysis (3rd ed.)

Nonparametric methods for quantitative analysis (3rd ed.)
GlOSS: text-source discovery over the Internet

ACM Transactions on Database Systems (TODS)
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
StatiX: making XML count

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Statistical synopses for graph-structured XML databases

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Estimating Answer Sizes for XML Queries

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Counting Twig Matches in a Tree

Proceedings of the 17th International Conference on Data Engineering
Server Ranking for Distributed Text Retrieval Systems on the Internet

Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
XRANK: ranked keyword search over XML documents

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Comparing the performance of collection selection algorithms

ACM Transactions on Information Systems (TOIS)
Selectivity Estimation for XML Twigs

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Approximate XML query answers

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Efficient keyword search for smallest LCAs in XML databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Interconnection semantics for keyword search in XML

Proceedings of the 14th ACM international conference on Information and knowledge management
Keyword Proximity Search in XML Trees

IEEE Transactions on Knowledge and Data Engineering
XSEED: Accurate and Fast Cardinality Estimation for XPath Queries

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Multiway SLCA-based keyword search in XML data

Proceedings of the 16th international conference on World Wide Web
Identifying meaningful return information for XML keyword search

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Structure and value synopses for XML data graphs

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
XSEarch: a semantic search engine for XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Effective keyword search for valuable lcas over xml documents

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
XSeek: a semantic XML search engine using keywords

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Efficient LCA based keyword search in XML data

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Enabling Schema-Free XQuery with meaningful query focus

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient keyword search over virtual XML views

The VLDB Journal — The International Journal on Very Large Data Bases
Effective XML Keyword Search with Relevance Oriented Ranking

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Finding and ranking compact connected trees for effective keyword proximity search in XML documents

Information Systems
Lowest common ancestors in trees and directed acyclic graphs

Journal of Algorithms
Fast ELCA computation for keyword queries on XML data

Proceedings of the 13th International Conference on Extending Database Technology
Suggestion of promising result types for XML keyword search

Proceedings of the 13th International Conference on Extending Database Technology
K-graphs: selecting top-k data sources for XML keyword queries

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I

A query transformation framework for automated structured query construction in structured retrieval environment

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the proliferation of XML data, searching XML data using keyword queries has attracted much attention. However, most of the current approaches focus on keyword-based searches over a single XML document. Searching over a system integrating hundreds or even thousands of data sources by sequentially querying every single source is extremely costly, and thus may be impractical. In this article we propose a novel approach for selecting the top-K data sources by relying on their relevance to a given query, to avoid the high cost of searching in numerous, potentially irrelevant data sources. Our approach summarizes the data sources as succinct synopses for the rapid filtering of non-promising sources. We maintain both structural and value distribution information of each data source, and propose a novel ranking function to measure effectively the relevance of the data source to the given query. We conducted experiments with real datasets, and results show that our approach achieves high performances in all evaluation metrics: recall, precision and Spearman's rank correlation coefficient with different experimental parameters.