IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient identification of regular expressions from representative examples
COLT '93 Proceedings of the sixth annual conference on Computational learning theory
Regular expressions into finite automata
Theoretical Computer Science
Lore: a database management system for semistructured data
ACM SIGMOD Record
Recent advances of grammatical inference
Theoretical Computer Science - Special issue on algorithmic learning theory
Extracting schema from semistructured data
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
One-unambiguous regular languages
Information and Computation
Storing semistructured data with STORED
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Characterization of Glushkov automata
Theoretical Computer Science
Inductive Inference: Theory and Methods
ACM Computing Surveys (CSUR)
Implementing conditional term rewriting by graph rewriting
Theoretical Computer Science
Structural inference for semistructured data
Proceedings of the tenth international conference on Information and knowledge management
Introduction To Automata Theory, Languages, And Computation
Introduction To Automata Theory, Languages, And Computation
XTRACT: Learning Document Type Descriptors from XML Document Collections
Data Mining and Knowledge Discovery
Efficient extraction of schemas for XML documents
Information Processing Letters
Representative Objects: Concise Representations of Semistructured, Hierarchial Data
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Optimizing Regular Path Expressions Using Graph Schemas
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Adding Structure to Unstructured Data
ICDT '97 Proceedings of the 6th International Conference on Database Theory
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Answering XML Queries on Heterogeneous Data Sources
Proceedings of the 27th International Conference on Very Large Data Bases
Everything You Ever Wanted to Know About DTDs, But Were Afraid to Ask (Extended Abstract)
Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases
Inductive Inference, DFAs, and Computational Complexity
AII '89 Proceedings of the International Workshop on Analogical and Inductive Inference
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
WWW '03 Proceedings of the 12th international conference on World Wide Web
Generic Model Management: Concepts And Algorithms (Lecture Notes in Computer Science)
Generic Model Management: Concepts And Algorithms (Lecture Notes in Computer Science)
DTDs versus XML schema: a practical study
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Queue - Semi-structured Data
XStruct: Efficient Schema Extraction from Multiple and Large XML Documents
ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Inference of concise DTDs from XML data
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Expressiveness and complexity of XML Schema
ACM Transactions on Database Systems (TODS)
Obtaining shorter regular expressions from finite-state automata
Theoretical Computer Science
Guided interaction: A mechanism to enable ad hoc service interaction
Information Systems Frontiers
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Simple off the shelf abstractions for XML schema
ACM SIGMOD Record
Inferring XML schema definitions from XML data
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
XPath satisfiability in the presence of DTDs
Journal of the ACM (JACM)
Learning deterministic regular expressions for the inference of schemas from XML data
Proceedings of the 17th international conference on World Wide Web
Algorithms for learning regular expressions from positive data
Information and Computation
Complexity measures for regular expressions
Journal of Computer and System Sciences
Approximation to the smallest regular expression for a given regular language
CIAA'04 Proceedings of the 9th international conference on Implementation and Application of Automata
Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data
ACM Transactions on the Web (TWEB)
Generating, sampling and counting subclasses of regular tree languages
Proceedings of the 14th International Conference on Database Theory
Enabling information extraction by inference of regular expressions from sample entities
Proceedings of the 20th ACM international conference on Information and knowledge management
Succinctness of the Complement and Intersection of Regular Expressions
ACM Transactions on Computational Logic (TOCL)
Deterministic regular expressions in linear time
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
The complexity of evaluating path expressions in SPARQL
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Foundations of regular expressions in XML schema languages and SPARQL
PhD '12 Proceedings of the on SIGMOD/PODS 2012 PhD Symposium
Foundations of XML based on logic and automata: a snapshot
FoIKS'12 Proceedings of the 7th international conference on Foundations of Information and Knowledge Systems
Learning twig and path queries
Proceedings of the 15th International Conference on Database Theory
Type-based detection of XML query-update independence
Proceedings of the VLDB Endowment
Proceedings of the 16th International Database Engineering & Applications Sysmposium
Developing and analyzing XSDs through BonXai
Proceedings of the VLDB Endowment
Consistency and repair for XML write-access control policies
The VLDB Journal — The International Journal on Very Large Data Bases
Fast learning of restricted regular expressions and DTDs
Proceedings of the 16th International Conference on Database Theory
Definability problems for graph query languages
Proceedings of the 16th International Conference on Database Theory
Discovering XSD keys from XML data
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
The complexity of regular expressions and property paths in SPARQL
ACM Transactions on Database Systems (TODS) - Invited papers issue
Hi-index | 0.00 |
We consider the problem of inferring a concise Document Type Definition (DTD) for a given set of XML-documents, a problem that basically reduces to learning concise regular expressions from positive examples strings. We identify two classes of concise regular expressions—the single occurrence regular expressions (SOREs) and the chain regular expressions (CHAREs)—that capture the far majority of expressions used in practical DTDs. For the inference of SOREs we present several algorithms that first infer an automaton for a given set of example strings and then translate that automaton to a corresponding SORE, possibly repairing the automaton when no equivalent SORE can be found. In the process, we introduce a novel automaton to regular expression rewrite technique which is of independent interest. When only a very small amount of XML data is available, however (for instance when the data is generated by Web service requests or by answers to queries), these algorithms produce regular expressions that are too specific. Therefore, we introduce a novel learning algorithm crx that directly infers CHAREs (which form a subclass of SOREs) without going through an automaton representation. We show that crx performs very well within its target class on very small datasets.