XCluster Synopses for Structured XML Content

Authors:
Neoklis Polyzotis;Minos Garofalakis
Affiliations:
University of California, Santa Cruz;Intel Research Berkeley
Venue:
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Year:
2006

Citing 0
Cited 18

AQAX: a system for approximate XML query answers

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
XSKETCH synopses for XML data graphs

ACM Transactions on Database Systems (TODS)
Query biased snippet generation in XML search

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Temporal XML: modeling, indexing, and query processing

The VLDB Journal — The International Journal on Very Large Data Bases
XSelMark: A Micro-benchmark for Selectivity Estimation Approaches of XML Queries

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
TuG synopses for approximate query answering

ACM Transactions on Database Systems (TODS)
Improving XML search by generating and utilizing informative result snippets

ACM Transactions on Database Systems (TODS)
Exploring XML web collections with DescribeX

ACM Transactions on the Web (TWEB)
Towards a comprehensive assessment for selectivity estimation approaches of XML queries

International Journal of Web Engineering and Technology
Generation of synthetic XML for evaluation of hybrid XML systems

DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Optimizing incremental maintenance of minimal bisimulation of cyclic graphs

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Index vs. navigation in XPath evaluation

XSym'06 Proceedings of the 4th international conference on Database and XML Technologies
Using Bayesian networks theory for aggregated search to XML retrieval

Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases
Possibilistic model for aggregated search in XML documents

International Journal of Intelligent Information and Database Systems
Fast answering of XPath query workloads on web collections

XSym'07 Proceedings of the 5th international conference on Database and XML Technologies
Locating and ranking XML documents based on content and structure synopses

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
A gossip-based approach for Internet-scale cardinality estimation of XPath queries over distributed semistructured data

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.01

Visualization

Abstract

We tackle the difficult problem of summarizing the path/branching structure and value content of an XML database that comprises both numeric and textual values. We introduce a novel XML-summarization model, termed XCLUSTERs, that enables accurate selectivity estimates for the class of twig queries with numeric-range, substring, and textual IR predicates over the content of XML elements. In a nutshell, an XCLUSTER synopsis represents an effective clustering of XML elements based on both their structural and value-based characteristics. By leveraging techniques for summarizing XML-document structure as well as numeric and textual data distributions, our XCLUSTER model provides the first known unified framework for handling path/branching structure and different types of element values. We detail the XCLUSTER model, and develop a systematic framework for the construction of effective XCLUSTER summaries within a specified storage budget. Experimental results on synthetic and real-life data verify the effectiveness of our XCLUSTER synopses, clearly demonstrating their ability to accurately summarize XML databases with mixed-value content. To the best of our knowledge, ours is the first work to address the summarization problem for structured XML content in its full generality.