On mining XML structures based on statistics

Authors:
Hiroshi Ishikawa;Shohei Yokoyama;Manabu Ohta;Kaoru Katayama
Affiliations:
Graduate School of Engineering, Tokyo Metropolitan University;Graduate School of Engineering, Tokyo Metropolitan University;Graduate School of Engineering, Tokyo Metropolitan University;Graduate School of Engineering, Tokyo Metropolitan University
Venue:
KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I
Year:
2005

Citing 9
Cited 0

Storing semistructured data with STORED

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
On supporting containment queries in relational database management systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
XRel: a path-based approach to storage and retrieval of XML documents using relational databases

ACM Transactions on Internet Technology (TOIT)
The design and performance evaluation of alternative XML storage strategies

ACM SIGMOD Record
Fundamentals of Database Systems

Fundamentals of Database Systems
Path materialization revisited: an efficient storage model for XML data

ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
Relational Databases for Querying XML Documents: Limitations and Opportunities

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
XML and Object-Relational Database Systems - Enhancing Structural Mappings Based on Statistics

Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases
The XML benchmark project

The XML benchmark project

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose an approach to dynamically generate database schemas for well-formed XML data. Our approach controls the number of tables to be divided based on statistics of XML so that the total cost of processing queries is reduced. We devise schemas appropriate for complex data such as text formatting and child elements with the small maximum number of occurrences in order to reduce the number of tables. To this end, we define three functions NULL expectation, Large Leaf Fields, and Large Child Fields for controlling the tables to be divided. We evaluated typical XML queries over the generated schemas and normalized schemas and measured and compared both of the costs. Through this, we successfully validated our approach.