IEEE Transactions on Pattern Analysis and Machine Intelligence
Regular expressions into finite automata
Theoretical Computer Science
Extracting schema from semistructured data
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
LORE: a Lightweight Object REpository for semistructured data
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Storing semistructured data with STORED
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Structural inference for semistructured data
Proceedings of the tenth international conference on Information and knowledge management
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
ToXgene: a template-based data generator for XML
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
XTRACT: Learning Document Type Descriptors from XML Document Collections
Data Mining and Knowledge Discovery
Efficient extraction of schemas for XML documents
Information Processing Letters
Representative Objects: Concise Representations of Semistructured, Hierarchial Data
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Optimizing Regular Path Expressions Using Graph Schemas
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Adding Structure to Unstructured Data
ICDT '97 Proceedings of the 6th International Conference on Database Theory
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Answering XML Queries on Heterogeneous Data Sources
Proceedings of the 27th International Conference on Very Large Data Bases
Learning Tree Languages from Text
COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
XPath Containment in the Presence of Disjunction, DTDs, and Variables
ICDT '03 Proceedings of the 9th International Conference on Database Theory
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
WWW '03 Proceedings of the 12th international conference on World Wide Web
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Generic Model Management: Concepts And Algorithms (Lecture Notes in Computer Science)
Generic Model Management: Concepts And Algorithms (Lecture Notes in Computer Science)
DTDs versus XML schema: a practical study
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Expressiveness of XSDs: from practice to theory, there and back again
WWW '05 Proceedings of the 14th international conference on World Wide Web
XPath satisfiability in the presence of DTDs
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Queue - Semi-structured Data
Taxonomy of XML schema languages using formal language theory
ACM Transactions on Internet Technology (TOIT)
XStruct: Efficient Schema Extraction from Multiple and Large XML Documents
ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Query optimization in XML structured-document databases
The VLDB Journal — The International Journal on Very Large Data Bases
Inference of concise DTDs from XML data
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Expressiveness and complexity of XML Schema
ACM Transactions on Database Systems (TODS)
On the minimization of XML Schemas and tree automata for unranked trees
Journal of Computer and System Sciences
Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
ShreX: managing XML documents in relational databases
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Learning (k,l)-contextual tree languages for information extraction
ECML'05 Proceedings of the 16th European conference on Machine Learning
Simple off the shelf abstractions for XML schema
ACM SIGMOD Record
From dirt to shovels: fully automatic tool generation from ad hoc data
Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Learning deterministic regular expressions for the inference of schemas from XML data
Proceedings of the 17th international conference on World Wide Web
SchemaScope: a system for inferring and cleaning XML schemas
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Succinctness of Regular Expressions with Interleaving, Intersection and Counting
MFCS '08 Proceedings of the 33rd international symposium on Mathematical Foundations of Computer Science
Linear time membership in a class of regular expressions with interleaving and counting
Proceedings of the 17th ACM conference on Information and knowledge management
Towards inference of more realistic XSDs
Proceedings of the 2009 ACM symposium on Applied Computing
MCN: A New Semantics Towards Effective XML Keyword Search
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Simplifying XML schema: effortless handling of nondeterministic regular expressions
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Inference of concise regular expressions and DTDs
ACM Transactions on Database Systems (TODS)
A product control system using the cellular data system
CIMMACS'09 Proceedings of the 8th WSEAS International Conference on Computational intelligence, man-machine systems and cybernetics
Adaptive relaxation for querying heterogeneous XML data sources
Information Systems
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Simplifying XML schema: single-type approximations of regular tree languages
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Succinctness of regular expressions with interleaving, intersection and counting
Theoretical Computer Science
Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data
ACM Transactions on the Web (TWEB)
Structural consistency: enabling XML keyword search to eliminate spurious results consistently
The VLDB Journal — The International Journal on Very Large Data Bases
On inference of XML schema with the knowledge of an obsolete one
ADC '09 Proceedings of the Twentieth Australasian Conference on Australasian Database - Volume 92
Ambiguous content and disambiguation of XML schemata
Proceedings of the Fourteenth International Database Engineering & Applications Symposium
Minimal tree language extensions: a keystone of XML type compatibility and evolution
ICTAC'10 Proceedings of the 7th International colloquium conference on Theoretical aspects of computing
On-the-fly entity-aware query processing in the presence of linkage
Proceedings of the VLDB Endowment
Generating, sampling and counting subclasses of regular tree languages
Proceedings of the 14th International Conference on Database Theory
Dealing with large schema sets in mobile SOS-based applications
Proceedings of the 2nd International Conference on Computing for Geospatial Research & Applications
Succinctness of the Complement and Intersection of Regular Expressions
ACM Transactions on Computational Logic (TOCL)
Instance-based XML data binding for mobile devices
Proceedings of the Third International Workshop on Middleware for Pervasive Mobile and Embedded Computing
An unsupervised approach for acquiring ontologies and RDF data from online life science databases
ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
Proceedings of the 21st international conference companion on World Wide Web
Foundations of regular expressions in XML schema languages and SPARQL
PhD '12 Proceedings of the on SIGMOD/PODS 2012 PhD Symposium
Foundations of XML based on logic and automata: a snapshot
FoIKS'12 Proceedings of the 7th international conference on Foundations of Information and Knowledge Systems
Finding optimal probabilistic generators for XML collections
Proceedings of the 15th International Conference on Database Theory
Fast learning of restricted regular expressions and DTDs
Proceedings of the 16th International Conference on Database Theory
Discovering XSD keys from XML data
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Simplifying XML Schema: Single-type approximations of regular tree languages
Journal of Computer and System Sciences
Example-driven modeling: model = abstractions + examples
Proceedings of the 2013 International Conference on Software Engineering
On repairing structural problems in semi-structured data
Proceedings of the VLDB Endowment
Conservative type extensions for XML data
Transactions on Large-Scale Data- and Knowledge-centered systems IX
Hi-index | 0.00 |
Although the presence of a schema enables many optimizations for operations on XML documents, recent studies have shown that many XML documents in practice either do not refer to a schema, or refer to a syntactically incorrect one. It is therefore of utmost importance to provide tools and techniques that can automatically generate schemas from sets of sample documents. While previous work in this area has mostly focused on the inference of Document Type Definitions (DTDs for short), we will consider the inference of XML Schema Definitions (XSDs for short) --- the increasingly popular schema formalism that is turning DTDs obsolete. In contrast to DTDs where the content model of an element depends only on the element's name, the content model in an XSD can also depend on the context in which the element is used. Hence, while the inference of DTDs basically reduces to the inference of regular expressions from sets of sample strings, the inference of XSDs also entails identifying from a corpus of sample documents the contexts in which elements bear different content models. Since a seminal result by Gold implies that no inference algorithm can learn the complete class of XSDs from positive examples only, we focus on a class of XSDs that captures most XSDs occurring in practice. For this class, we provide a theoretically complete algorithm that always infers the correct XSD when a sufficiently large corpus of XML documents is available. In addition, we present a variant of this algorithm that works well on real-world (and therefore incomplete) data sets.