Recovering data semantics from XML documents into DTD graph with SAX

Authors:
Herbert Shiu;Joseph Fong;Robert P. Biuk-Aghai
Affiliations:
Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong;Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong;Department of Computer and Information Science, University of Macau, Macau
Venue:
ACOS'06 Proceedings of the 5th WSEAS international conference on Applied computer science
Year:
2006

Citing 7
Cited 0

Efficient extraction of schemas for XML documents

Information Processing Letters
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Rule-Based Conversion of a DTD to a Conceptual Schema

ER '01 Proceedings of the 20th International Conference on Conceptual Modeling: Conceptual Modeling
DTD-Miner: A Tool for Mining DTD from XML Documents

WECWIS '00 Proceedings of the Second International Workshop on Advance Issues of E-Commerce and Web-Based Information Systems (WECWIS 2000)
UML Documentation Support for XML Schema

ASWEC '04 Proceedings of the 2004 Australian Software Engineering Conference
An Overview of Research on Reverse Engineering XML Schemas into UML Diagrams

ICITA '05 Proceedings of the Third International Conference on Information Technology and Applications (ICITA'05) Volume 2 - Volume 02
XStruct: Efficient Schema Extraction from Multiple and Large XML Documents

ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a systematic approach to reverse engineer arbitrary XML documents to their conceptual schema, DTD Graphs. The necessity for doing so is due to the fact that XML documents are frequently used for storing structured data and their schemas, such as in Document Type Definition (DTD) format, are missing, especially for those existing historical XML documents. As such, it is difficult for software developers or end users to make use of them. Even the schemas exist, they are difficult to read and undetermined of the underlying relationships among the elements in the documents. In view of this, it is necessary to determine the data semantics from the XML documents. If the DTDs of the XML documents exist with the identifications of the ID/IDREF(S) type attributes, then more data semantics can be derived. Another application of the determined data semantics is to verify the linkages implemented by ID/IDREF(S). If the element is referring to an incorrect XML element type, an extra data semantic will be determined as a result, and such findings can be used for verification purposes. Furthermore, the approaches proposed in this paper use Simple API for XML (SAX) so that the algorithms are applicable to small to huge sized XML documents.