Building XML statistics for the hidden web

Authors:
Ashraf Aboulnaga;Jeffrey F. Naughton
Affiliations:
IBM Almaden Research Center;University of Wisconsin - Madison, WI
Venue:
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Year:
2003

Citing 11
Cited 1

Query caching and optimization in distributed mediator systems

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
An adaptive query execution system for data integration

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
StatiX: making XML count

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Statistical synopses for graph-structured XML databases

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Counting Twig Matches in a Tree

Proceedings of the 17th International Conference on Data Engineering
Cost Models DO Matter: Providing Cost Information for Diverse Data Sources in a Federated System

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Crawling the Hidden Web

Proceedings of the 27th International Conference on Very Large Data Bases
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications

Proceedings of the 27th International Conference on Very Large Data Bases
Leveraging Mediator Cost Models with Heterogeneous Data Sources

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
XPathLearner: an on-line self-tuning Markov histogram for XML path selectivity estimation

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Structure and value synopses for XML data graphs

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

XSKETCH synopses for XML data graphs

ACM Transactions on Database Systems (TODS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

There have been several techniques proposed for building statistics for static XML data. However, very little work has been done in the area of building XML statistics for data sources that export XML views of data that is stored in relational or other databases. For such data sources, we need statistics that are built in an on-line manner, by observing the XML queries to the data sources and their results. In this paper, we present a technique for building on-line XML statistics by observing the XPath queries issued to a data source and their result sizes. These XPath queries select parts of the virtual XML document representing the XML view of the data at the data source. We convert these XPath queries to a more abstract and generalized form that we call annotated path expressions. We present a technique for storing these annotated path expressions and information about their selectivity for use in estimating the selectivity of future XPath queries. We also present an experimental evaluation of our proposed approach.