StatiX: making XML count

Authors:
Juliana Freire;Jayant R. Haritsa;Maya Ramanath;Prasan Roy;Jérôme Siméon
Affiliations:
Bell Labs;lndian Institute of Science;lndian Institute of Science;Bell Labs;Bell Labs
Venue:
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Year:
2002

Citing 18
Cited 53

Advanced query processing in object bases using access support relations

Proceedings of the sixteenth international conference on Very large databases
Optimal histograms for limiting worst-case error propagation in the size of join results

ACM Transactions on Database Systems (TODS)
Regular expressions into finite automata

Theoretical Computer Science
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
One-unambiguous regular languages

Information and Computation
Storing semistructured data with STORED

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Handbook of Formal Languages

Handbook of Formal Languages
Statistical synopses for graph-structured XML databases

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Accurate estimation of the number of tuples satisfying a condition

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Estimating Answer Sizes for XML Queries

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Counting Twig Matches in a Tree

Proceedings of the 17th International Conference on Data Engineering
Query Optimization for XML

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Relational Databases for Querying XML Documents: Limitations and Opportunities

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications

Proceedings of the 27th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The Volcano Optimizer Generator: Extensibility and Efficient Search

Proceedings of the Ninth International Conference on Data Engineering
From XML Schema to Relations: A Cost-Based Approach to XML Storage

ICDE '02 Proceedings of the 18th International Conference on Data Engineering

Adaptive XML Shredding: Architecture, Implementation, and Challenges

Proceedings of the VLDB 2002 Workshop EEXTT and CAiSE 2002 Workshop DTWeb on Efficiency and Effectiveness of XML Tools and Techniques and Data Integration over the Web-Revised Papers
Containment join size estimation: models and methods

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Dynamic XML documents with distribution and replication

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
XQuery speedup using replication in mapping XML into relations

Proceedings of the 2003 ACM symposium on Applied computing
Building XML statistics for the hidden web

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Selectivity Estimation for XML Twigs

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
A Flexible Infrastructure for Gathering XML Statistics and Estimating Query Cardinality

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Approximate XML query answers

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
IMAX: Incremental Maintenance of Schema-Based XML Statistics

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Statistical learning techniques for costing XML queries

VLDB '05 Proceedings of the 31st international conference on Very large data bases
CXHist: an on-line classification-based histogram for XML string selectivity estimation

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Storing XML (with XSD) in SQL Databases: Interplay of Logical and Physical Designs

IEEE Transactions on Knowledge and Data Engineering
Studying the XML Web: Gathering Statistics from an XML Sample

World Wide Web
Cost-based optimization in DB2 XML

IBM Systems Journal
XSKETCH synopses for XML data graphs

ACM Transactions on Database Systems (TODS)
Schema-conscious XML indexing

Information Systems
An efficient infrastructure for native transactional XML processing

Data & Knowledge Engineering
Structure and value synopses for XML data graphs

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
LegoDB: customizing relational storage for XML documents

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Mixed mode XML query processing

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Bloom histogram: path selectivity estimation for XML data with updates

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Inferring XML schema definitions from XML data

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Cardinality estimation for the optimization of queries on ontologies

ACM SIGMOD Record
Accurate histogram-based XML summarization

Proceedings of the 2008 ACM symposium on Applied computing
Learning deterministic regular expressions for the inference of schemas from XML data

Proceedings of the 17th international conference on World Wide Web
UserMap: an adaptive enhancing of user-driven XML-to-relational mapping strategies

ADC '08 Proceedings of the nineteenth conference on Australasian database - Volume 75
A relational model for XML structural joins and their size estimations

Knowledge and Information Systems
XSelMark: A Micro-benchmark for Selectivity Estimation Approaches of XML Queries

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Enabling XPath Optional Axes Cardinality Estimation Using Path Synopses

ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
EXsum: an XML summarization framework

IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
A sampling approach for XML query selectivity estimation

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Xoom: a tool for zooming in and out of XML documents

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Refining Keyword Queries for XML Retrieval by Combining Content and Structure

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
ROX: run-time optimization of XQueries

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Simplifying XML schema: effortless handling of nondeterministic regular expressions

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Synopsis based load shedding in XML streams

Proceedings of the 2009 EDBT/ICDT Workshops
XQuery speedup by deploying structural redundancy in mapping XML into relations

Information and Software Technology
Statistics-based parallelization of XPath queries in shared memory systems

Proceedings of the 13th International Conference on Extending Database Technology
LCA-based selection for XML document collections

Proceedings of the 19th international conference on World wide web
Adaptability in XML-to-relational mapping strategies

Proceedings of the 2010 ACM Symposium on Applied Computing
Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data

ACM Transactions on the Web (TWEB)
Towards a comprehensive assessment for selectivity estimation approaches of XML queries

International Journal of Web Engineering and Technology
Ambiguous content and disambiguation of XML schemata

Proceedings of the Fourteenth International Database Engineering & Applications Symposium
Generation of synthetic XML for evaluation of hybrid XML systems

DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Holistic schema mappings for XML-on-RDBMS

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
A decomposition-based probabilistic framework for estimating the selectivity of XML twig queries

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
What's next in XML and databases?

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Applying cosine series to XML structural join size estimation

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
A quantitative summary of XML structures

ER'06 Proceedings of the 25th international conference on Conceptual Modeling
Top-K data source selection for keyword queries over multiple XML data sources

Journal of Information Science
Efficiency frontiers of XML cardinality constraints

Data & Knowledge Engineering
A gossip-based approach for Internet-scale cardinality estimation of XPath queries over distributed semistructured data

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

The availability of summary data for XML documents has many applications, from providing users with quick feedback about their queries, to cost-based storage design and query optimization. StatiX is a novel XML Schema-aware statistics framework that exploits the structure derived by regular expressions (which define elements in an XML Schema) to pinpoint places in the schema that are likely sources of structural skew. As we discuss below, this information can be used to build concise, yet accurate, statistical summaries for XML data. StatiX leverages standard XML technology for gathering statistics, notably XML Schema validators, and it uses histograms to summarize both the structure and values in an XML document. In this paper we describe the StatiX system. We develop algorithms that decompose schemas to obtain statistics at different granularities and discuss how statistics can be gathered as documents are validated. We also present an experimental evaluation which demonstrates the accuracy and scalability of our approach and show an application of these statistics to cost-based XML storage design.