Efficient extraction of schemas for XML documents
Information Processing Letters
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Rule-Based Conversion of a DTD to a Conceptual Schema
ER '01 Proceedings of the 20th International Conference on Conceptual Modeling: Conceptual Modeling
DTD-Miner: A Tool for Mining DTD from XML Documents
WECWIS '00 Proceedings of the Second International Workshop on Advance Issues of E-Commerce and Web-Based Information Systems (WECWIS 2000)
UML Documentation Support for XML Schema
ASWEC '04 Proceedings of the 2004 Australian Software Engineering Conference
An Overview of Research on Reverse Engineering XML Schemas into UML Diagrams
ICITA '05 Proceedings of the Third International Conference on Information Technology and Applications (ICITA'05) Volume 2 - Volume 02
XStruct: Efficient Schema Extraction from Multiple and Large XML Documents
ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Hi-index | 0.00 |
We propose a systematic approach to reverse engineer arbitrary XML documents to their conceptual schema, DTD Graphs. The necessity for doing so is due to the fact that XML documents are frequently used for storing structured data and their schemas, such as in Document Type Definition (DTD) format, are missing, especially for those existing historical XML documents. As such, it is difficult for software developers or end users to make use of them. Even the schemas exist, they are difficult to read and undetermined of the underlying relationships among the elements in the documents. In view of this, it is necessary to determine the data semantics from the XML documents. If the DTDs of the XML documents exist with the identifications of the ID/IDREF(S) type attributes, then more data semantics can be derived. Another application of the determined data semantics is to verify the linkages implemented by ID/IDREF(S). If the element is referring to an incorrect XML element type, an extra data semantic will be determined as a result, and such findings can be used for verification purposes. Furthermore, the approaches proposed in this paper use Simple API for XML (SAX) so that the algorithms are applicable to small to huge sized XML documents.