Lore: a database management system for semistructured data
ACM SIGMOD Record
Recent advances of grammatical inference
Theoretical Computer Science - Special issue on algorithmic learning theory
Inferring structure in semistructured data
ACM SIGMOD Record
XTRACT: a system for extracting document type descriptors from XML documents
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Proceedings of the Second International Colloquium on Grammatical Inference and Applications
ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
Learning Stochastic Regular Grammars by Means of a State Merging Method
ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Inductive Inference, DFAs, and Computational Complexity
AII '89 Proceedings of the International Workshop on Analogical and Inductive Inference
Identifying Terminal Distinguishable Languages
Annals of Mathematics and Artificial Intelligence
Effective structural inference for large XML documents
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Inference of concise DTDs from XML data
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Inferring XML schema definitions from XML data
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Learning deterministic regular expressions for the inference of schemas from XML data
Proceedings of the 17th international conference on World Wide Web
Inference of concise regular expressions and DTDs
ACM Transactions on Database Systems (TODS)
Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data
ACM Transactions on the Web (TWEB)
Mobile information exchange and integration: from query to application layer
ADC '09 Proceedings of the Twentieth Australasian Conference on Australasian Database - Volume 92
Hi-index | 0.00 |
Semistructured data presents many challenges, mainly due to its lack of a strict schema. These challenges are further magnified when large amounts of data are gathered from heterogeneous sources. We address this by investigation and development of methods to automatically infer structural information from example data. Using XML as a reference format, we approach the schema generation problem by application of inductive inference theory. In doing so, we review and extend results relating to the search spaces of grammatical inferences. We then adapt a method for evaluating the result of an inference process from computational linguistics. Further, we combine several inference algorithms, including both new techniques introduced by us and those from previous work. Comprehensive experimentation reveals our new hybrid method, based upon recently developed optimisation techniques, to be the most effective.