Efficient discovery of XML data redundancies

Authors:
Cong Yu;H. V. Jagadish
Affiliations:
Department of EECS, University of Michigan;Department of EECS, University of Michigan
Venue:
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Year:
2006

Citing 15
Cited 13

A new normal form for nested relations

ACM Transactions on Database Systems (TODS)
A normal form for precisely characterizing redundancy in nested relations

ACM Transactions on Database Systems (TODS)
Multivalued dependencies and a new normal form for relational databases

ACM Transactions on Database Systems (TODS)
A relational model of data for large shared data banks

Communications of the ACM
Keys for XML

Proceedings of the 10th international conference on World Wide Web
On XML integrity constraints in the presence of DTDs

Journal of the ACM (JACM)
Functional and embedded dependency inference: a data mining point of view

Information Systems - Special issue on Databases: creation, management and utilization
XKvalidator: a constraint validator for XML

Proceedings of the eleventh international conference on Information and knowledge management
Designing Functional Dependencies for XML

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Dependency Inference

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Integrity constraints for XML

Journal of Computer and System Sciences - Special issue on PODS 2000
Reasoning about keys for XML

Information Systems
A normal form for XML documents

ACM Transactions on Database Systems (TODS)
Strong functional dependencies and their application to normal forms in XML

ACM Transactions on Database Systems (TODS)
Checking functional dependency satisfaction in XML

XSym'05 Proceedings of the Third international conference on Database and XML Technologies

Using semantics in XML query processing

Proceedings of the 2nd international conference on Ubiquitous information management and communication
Relational-style XML query

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
On the Notion of an XML Key

Semantics in Data and Knowledge Bases
Regular tree patterns: a uniform formalism for update queries and functional dependencies in XML

Proceedings of the 2010 EDBT/ICDT Workshops
Using transversals for discovering XML functional dependencies

FoIKS'08 Proceedings of the 5th international conference on Foundations of information and knowledge systems
Exploring XML web collections with DescribeX

ACM Transactions on the Web (TWEB)
Consistent query answers from virtually integrated XML data

Journal of Systems and Software
Fast detection of functional dependencies in XML data

XSym'10 Proceedings of the 7th international XML database conference on Database and XML technologies
CRIUS: user-friendly database design

Proceedings of the VLDB Endowment
Using structural information in XML keyword search effectively

ACM Transactions on Database Systems (TODS)
TwigTable: using semantics in XML twig pattern query processing

Journal on data semantics XV
The implication problem for 'closest node' functional dependencies in complete XML documents

Journal of Computer and System Sciences
Discovering conditional functional dependencies in XML data

ADC '11 Proceedings of the Twenty-Second Australasian Database Conference - Volume 115

Quantified Score

Hi-index	0.00

Visualization

Abstract

As XML becomes widely used, dealing with redundancies in XML data has become an increasingly important issue. Redundantly stored information can lead not just to a higher data storage cost, but also to increased costs for data transfer and data manipulation. Furthermore, such data redundancies can lead to potential update anomalies, rendering the database inconsistent.One way to avoid data redundancies is to employ good schema design based on known functional dependencies. In fact, several recent studies have focused on defining the notion of XML Functional Dependencies (XML FDs) to capture XML data redundancies. We observe further that XML databases are often "casually designed" and XML FDs may not be determined in advance. Under such circumstances, discovering XML data redundancies (in terms of FDs) from the data itself becomes necessary and is an integral part of the schema refinement process.In this paper, we present the design and implementation of the first system, DiscoverXFD, for effcient discovery of XML data redundancies. It employs a novel XML data structure and introduces a new class of partition based algorithms. DiscoverXFD can not only be used for the previous definitions of XML functional dependencies, but also for a more comprehensive notion we develop in this paper, capable of detecting redundancies involving set elements while maintaining clear semantics. Experimental evaluations using real life and benchmark datasets demonstrate that our system is practical and scales well with increasing data size.