K-graphs: selecting top-k data sources for XML keyword queries

Authors:
Khanh Nguyen;Jinli Cao
Affiliations:
Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, Australia;Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, Australia
Venue:
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Year:
2011

Citing 20
Cited 3

Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
GlOSS: text-source discovery over the Internet

ACM Transactions on Database Systems (TODS)
Server Ranking for Distributed Text Retrieval Systems on the Internet

Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
XRANK: ranked keyword search over XML documents

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Comparing the performance of collection selection algorithms

ACM Transactions on Information Systems (TOIS)
Efficient keyword search for smallest LCAs in XML databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Interconnection semantics for keyword search in XML

Proceedings of the 14th ACM international conference on Information and knowledge management
Keyword Proximity Search in XML Trees

IEEE Transactions on Knowledge and Data Engineering
Multiway SLCA-based keyword search in XML data

Proceedings of the 16th international conference on World Wide Web
Identifying meaningful return information for XML keyword search

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
XSEarch: a semantic search engine for XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Effective keyword search for valuable lcas over xml documents

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Efficient keyword search over virtual XML views

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
XSeek: a semantic XML search engine using keywords

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Efficient LCA based keyword search in XML data

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Enabling Schema-Free XQuery with meaningful query focus

The VLDB Journal — The International Journal on Very Large Data Bases
Fast Indexes and Algorithms for Set Similarity Selection Queries

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Effective XML Keyword Search with Relevance Oriented Ranking

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Lowest common ancestors in trees and directed acyclic graphs

Journal of Algorithms
Fast ELCA computation for keyword queries on XML data

Proceedings of the 13th International Conference on Extending Database Technology

Top-K data source selection for keyword queries over multiple XML data sources

Journal of Information Science
Towards benefit-based RDF source selection for SPARQL queries

SWIM '12 Proceedings of the 4th International Workshop on Semantic Web Information Management
Spelling suggestion for XML keyword search based on pairwise keyword summaries

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most of existing approaches on XML keyword search focus on querying over a single data source. However, searching over hundreds or even thousands of (distributed) data sources by sequentially querying every single data source is extremely costly, thus it can be impractical. In this paper, we propose an approach for selecting top-k data sources to a given query in order to avoid the high cost of searching numerous, potentially irrelevant data sources. The proposed approach can efficiently select top-k mostly relevant data sources without querying over the data sources. We propose a ranking function for measuring the strength of correlation between keywords in a data source and summarize the data sources as keywords correlation graphs (K-Graphs). The top-k relevant data sources will be selected by estimating the relevance of corresponding K-Graphs to the query. Experimental results show that the approach achieves good performance with a variety of experimental parameters.