Readings in object-oriented database systems
Readings in object-oriented database systems
Shortening the OED: experience with a grammar-defined database
ACM Transactions on Information Systems (TOIS)
Concepts for modeling and querying list-structured data
Information Processing and Management: an International Journal
From structured documents to novel query facilities
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
A Grammar-Based Approach Towards Unifying Hierarchical Data Models
SIAM Journal on Computing
Integrating contents and structure in text retrieval
ACM SIGMOD Record
Foundations of Databases: The Logical Level
Foundations of Databases: The Logical Level
Transformation of Documents and Schemas by Patterns and Contextual Conditions
PODP '96 Proceedings of the Third International Workshop on Principles of Document Processing
Mind Your Grammar: a New Approach to Modelling Text
VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Querying and Updating the File
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
A Grammar Based Model for XML Schema Integration
BNCOD 17 Proceedings of the 17th British National Conferenc on Databases: Advances in Databases
CSL '02 Proceedings of the 16th International Workshop and 11th Annual Conference of the EACSL on Computer Science Logic
Modelling Semi-structured Documents with Hedges for Deduction and Induction
ILP '01 Proceedings of the 11th International Conference on Inductive Logic Programming
A bit-parallel tree matching algorithm for patterns with horizontal VLDC's
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Hi-index | 0.00 |
This paper shows a data model for transforming and assembling document information such as SGML or XML documents. The biggest advantage over other data models is that this data model simultaneously provides (1) powerful patterns and contextual conditions, and (2) schema transformation. Patterns and contextual conditions capture conditions on subordinates and those on superiors, siblings, subordinates of siblings, etc, respectively, and have been recognized as highly important mechanisms for identifying document components in the document processing community. Meanwhile, schema transformation has been, since the RDB, recognized as crucial in the database community. However, no data models have provided all three of patterns, contextual conditions, and schema transformation. This data model is based on the forest-regular language theory. A schema is a forest automaton and an instance is a finite set of forests (sequences of trees). Since the parse tree set of an extended-context free grammar is accepted by a forest automaton, this model is a generalization of Gonnet and Tompa's grammatical model. Patterns are captured as forest automatons; contextual conditions are pointed forest representations (a variation of Podelski's pointed tree representations). Controlled by patterns and contextual conditions, an operator creates an instance from an input instance and also creates a reasonably small schema from an input schema. Furthermore, the created schema is often minimally sufficient; any forest permitted by it may be generated by some input instance.