A quantitative summary of XML structures

Authors:
Zi Lin;Bingsheng He;Byron Choi
Affiliations:
Nanyang Technological University;Hong Kong University of Science and Technology;Nanyang Technological University
Venue:
ER'06 Proceedings of the 25th international conference on Conceptual Modeling
Year:
2006

Citing 22
Cited 2

Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Storing semistructured data with STORED

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
XMill: an efficient compressor for XML data

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Validating streaming XML documents

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
StatiX: making XML count

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Statistical synopses for graph-structured XML databases

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A general technique for querying XML documents using a relational database system

ACM SIGMOD Record
Index Structures for Path Expressions

ICDT '99 Proceedings of the 7th International Conference on Database Theory
Counting Twig Matches in a Tree

Proceedings of the 17th International Conference on Data Engineering
Query Optimization for XML

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Relational Databases for Querying XML Documents: Limitations and Opportunities

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Anatomy of a native XML base management system

The VLDB Journal — The International Journal on Very Large Data Bases
TIMBER: a native system for querying XML

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Compressing XML with Multiplexed Hierarchical PPM Models

DCC '01 Proceedings of the Data Compression Conference
XBench Benchmark and Performance Testing of XML DBMSs

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Incremental evaluation of schema-directed XML publishing

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
DTDs versus XML schema: a practical study

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Vectorizing and Querying Large XML Repositories

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
MonetDB/XQuery: a fast XQuery processor powered by a relational engine

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
XMark: a benchmark for XML data management

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
From XML view updates to relational view updates: old solutions to a new problem

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Document decomposition for XML compression: a heuristic approach

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications

A crash course on database queries

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
XEvolve: an XML schema evolution framework

Proceedings of the 2011 ACM Symposium on Applied Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Statistical summaries in relational databases mainly focus on the distribution of data values and have been found useful for various applications, such as query evaluation and data storage. As xml has been widely used, e.g. for online data exchange, the need for (corresponding) statistical summaries in xml has been evident. While relational techniques may be applicable to the data values in xml documents, novel techniques are requried for summarizing the structures of xml documents. In this paper, we propose metrics for major structural properties, in particular, nestings of entities and one-to-many relationships, of XML documents. Our technique is different from the existing ones in that we generate a quantitative summary of an xml structure. By using our approach, we illustrate that some popular real-world and synthetic xml benchmark datasets are indeed highly skewed and hardly hierarchical and contain few recursions. We wish this preliminary finding shreds insight on improving the design of xml benchmarking and experimentations.