Data Model for Document Transformation and Assembly

Authors:
Makoto Murata
Affiliations:
-
Venue:
PODDP '98 Proceedings of the 4th International Workshop on Principles of Digital Document Processing
Year:
1998

Citing 10
Cited 4

Readings in object-oriented database systems

Readings in object-oriented database systems
Shortening the OED: experience with a grammar-defined database

ACM Transactions on Information Systems (TOIS)
Concepts for modeling and querying list-structured data

Information Processing and Management: an International Journal
From structured documents to novel query facilities

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
A Grammar-Based Approach Towards Unifying Hierarchical Data Models

SIAM Journal on Computing
Integrating contents and structure in text retrieval

ACM SIGMOD Record
Foundations of Databases: The Logical Level

Foundations of Databases: The Logical Level
Transformation of Documents and Schemas by Patterns and Contextual Conditions

PODP '96 Proceedings of the Third International Workshop on Principles of Document Processing
Mind Your Grammar: a New Approach to Modelling Text

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Querying and Updating the File

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases

A Grammar Based Model for XML Schema Integration

BNCOD 17 Proceedings of the 17th British National Conferenc on Databases: Advances in Databases
Automata, Logic, and XML

CSL '02 Proceedings of the 16th International Workshop and 11th Annual Conference of the EACSL on Computer Science Logic
Modelling Semi-structured Documents with Hedges for Deduction and Induction

ILP '01 Proceedings of the 11th International Conference on Inductive Logic Programming
A bit-parallel tree matching algorithm for patterns with horizontal VLDC's

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper shows a data model for transforming and assembling document information such as SGML or XML documents. The biggest advantage over other data models is that this data model simultaneously provides (1) powerful patterns and contextual conditions, and (2) schema transformation. Patterns and contextual conditions capture conditions on subordinates and those on superiors, siblings, subordinates of siblings, etc, respectively, and have been recognized as highly important mechanisms for identifying document components in the document processing community. Meanwhile, schema transformation has been, since the RDB, recognized as crucial in the database community. However, no data models have provided all three of patterns, contextual conditions, and schema transformation. This data model is based on the forest-regular language theory. A schema is a forest automaton and an instance is a finite set of forests (sequences of trees). Since the parse tree set of an extended-context free grammar is accepted by a forest automaton, this model is a generalization of Gonnet and Tompa's grammatical model. Patterns are captured as forest automatons; contextual conditions are pointed forest representations (a variation of Podelski's pointed tree representations). Controlled by patterns and contextual conditions, an operator creates an instance from an input instance and also creates a reasonably small schema from an input schema. Furthermore, the created schema is often minimally sufficient; any forest permitted by it may be generated by some input instance.