Practical algorithms for finding prime attributes and testing normal forms
PODS '89 Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Deciding equivalence of finite tree automata
SIAM Journal on Computing
The design of relational databases
The design of relational databases
Algorithms for inferring functional dependencies from relations
Data & Knowledge Engineering
One-unambiguous regular languages
Information and Computation
Foundations of Databases: The Logical Level
Foundations of Databases: The Logical Level
Database Management Systems
On XML integrity constraints in the presence of DTDs
Journal of the ACM (JACM)
Discovering approximate keys in XML data
Proceedings of the eleventh international conference on Information and knowledge management
Levelwise Search and Borders of Theories in KnowledgeDiscovery
Data Mining and Knowledge Discovery
XTRACT: Learning Document Type Descriptors from XML Document Collections
Data Mining and Knowledge Discovery
A Feasibility and Performance Study of Dependency Inference
Proceedings of the Fifth International Conference on Data Engineering
What's Hard about XML Schema Constraints?
DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Information Systems
Taxonomy of XML schema languages using formal language theory
ACM Transactions on Internet Technology (TOIT)
Expressiveness and complexity of XML Schema
ACM Transactions on Database Systems (TODS)
Simple off the shelf abstractions for XML schema
ACM SIGMOD Record
Inferring XML schema definitions from XML data
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
XML schema refinement through redundancy detection and normalization
The VLDB Journal — The International Journal on Very Large Data Bases
SchemaScope: a system for inferring and cleaning XML schemas
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Discovering XML keys and foreign keys in queries
Proceedings of the 2009 ACM symposium on Applied Computing
Efficient reasoning about a robust XML key fragment
ACM Transactions on Database Systems (TODS)
Simplifying XML schema: effortless handling of nondeterministic regular expressions
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Inference of concise regular expressions and DTDs
ACM Transactions on Database Systems (TODS)
Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data
ACM Transactions on the Web (TWEB)
Finding optimal probabilistic generators for XML collections
Proceedings of the 15th International Conference on Database Theory
Hi-index | 0.00 |
A great deal of research into the learning of schemas from XML data has been conducted in recent years to enable the automatic discovery of XML Schemas from XML documents when no schema, or only a low-quality one is available. Unfortunately, and in strong contrast to, for instance, the relational model, the automatic discovery of even the simplest of XML constraints, namely XML keys, has been left largely unexplored in this context. A major obstacle here is the unavailability of a theory on reasoning about XML keys in the presence of XML schemas, which is needed to validate the quality of candidate keys. The present paper embarks on a fundamental study of such a theory and classifies the complexity of several crucial properties concerning XML keys in the presence of an XSD, like, for instance, testing for consistency, boundedness, satisfiability, universality, and equivalence. Of independent interest, novel results are obtained related to cardinality estimation of XPath result sets. A mining algorithm is then developed within the framework of levelwise search. The algorithm leverages known discovery algorithms for functional dependencies in the relational model, but incorporates the above mentioned properties to assess and refine the quality of derived keys. An experimental study on an extensive body of real world XML data evaluating the effectiveness of the proposed algorithm is provided.