Discovering XSD keys from XML data

Authors:
Marcelo Arenas;Jonny Daenen;Frank Neven;Martin Ugarte;Jan Van den Bussche;Stijn Vansummeren
Affiliations:
PUC Chile & University of Oxford, Santiago, Chile;Hasselt University & Transnational University of Limburg, Hasselt, Belgium;Hasselt University & Transnational University of Limburg, Hasselt, Belgium;PUC Chile, Santiago, Chile;Hasselt University & Transnational University of Limburg, Hasselt, Belgium;Université Libre de Bruxelles (ULB), Brussels, Belgium
Venue:
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Year:
2013

Citing 26
Cited 0

Practical algorithms for finding prime attributes and testing normal forms

PODS '89 Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Deciding equivalence of finite tree automata

SIAM Journal on Computing
The design of relational databases

The design of relational databases
Algorithms for inferring functional dependencies from relations

Data & Knowledge Engineering
One-unambiguous regular languages

Information and Computation
Foundations of Databases: The Logical Level

Foundations of Databases: The Logical Level
Database Management Systems

Database Management Systems
On XML integrity constraints in the presence of DTDs

Journal of the ACM (JACM)
Discovering approximate keys in XML data

Proceedings of the eleventh international conference on Information and knowledge management
Levelwise Search and Borders of Theories in KnowledgeDiscovery

Data Mining and Knowledge Discovery
XTRACT: Learning Document Type Descriptors from XML Document Collections

Data Mining and Knowledge Discovery
A Feasibility and Performance Study of Dependency Inference

Proceedings of the Fifth International Conference on Data Engineering
What's Hard about XML Schema Constraints?

DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Reasoning about keys for XML

Information Systems
Taxonomy of XML schema languages using formal language theory

ACM Transactions on Internet Technology (TOIT)
Expressiveness and complexity of XML Schema

ACM Transactions on Database Systems (TODS)
Simple off the shelf abstractions for XML schema

ACM SIGMOD Record
Inferring XML schema definitions from XML data

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
XML schema refinement through redundancy detection and normalization

The VLDB Journal — The International Journal on Very Large Data Bases
SchemaScope: a system for inferring and cleaning XML schemas

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Discovering XML keys and foreign keys in queries

Proceedings of the 2009 ACM symposium on Applied Computing
Efficient reasoning about a robust XML key fragment

ACM Transactions on Database Systems (TODS)
Simplifying XML schema: effortless handling of nondeterministic regular expressions

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Inference of concise regular expressions and DTDs

ACM Transactions on Database Systems (TODS)
Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data

ACM Transactions on the Web (TWEB)
Finding optimal probabilistic generators for XML collections

Proceedings of the 15th International Conference on Database Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

A great deal of research into the learning of schemas from XML data has been conducted in recent years to enable the automatic discovery of XML Schemas from XML documents when no schema, or only a low-quality one is available. Unfortunately, and in strong contrast to, for instance, the relational model, the automatic discovery of even the simplest of XML constraints, namely XML keys, has been left largely unexplored in this context. A major obstacle here is the unavailability of a theory on reasoning about XML keys in the presence of XML schemas, which is needed to validate the quality of candidate keys. The present paper embarks on a fundamental study of such a theory and classifies the complexity of several crucial properties concerning XML keys in the presence of an XSD, like, for instance, testing for consistency, boundedness, satisfiability, universality, and equivalence. Of independent interest, novel results are obtained related to cardinality estimation of XPath result sets. A mining algorithm is then developed within the framework of levelwise search. The algorithm leverages known discovery algorithms for functional dependencies in the relational model, but incorporates the above mentioned properties to assess and refine the quality of derived keys. An experimental study on an extensive body of real world XML data evaluating the effectiveness of the proposed algorithm is provided.