XTRACT: a system for extracting document type descriptors from XML documents
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Standardized interoperable image retrieval
Proceedings of the 2010 ACM Symposium on Applied Computing
Mapping audiovisual metadata formats using formal semantics
SAMT'10 Proceedings of the 5th international conference on Semantic and digital media technologies
Hi-index | 0.00 |
The eXtensible Markup Language (XML) is becoming the standard format for data exchange on the Internet, providing interoperability among Web applications. It is important to provide efficient algorithms and tools to manipulate XML documents that are ubiquitous on the Web. In this paper, we present a novel system for automating the transformation of XML documents based on structural mapping with the restriction that the leaf text information are exactly the same in the source and target documents. Firstly, tree edit distance algorithm is used to find the mapping between a pair of source and target documents. With the introduction of tree partition, the efficiency of the tree matching algorithm has been improved significantly. Secondly, template rules for transformation are inferred from the mapping using generalization. Thirdly, a template matching component is used to process new documents. Experimental studies have shown that our methods are very promising and can be widely used for Web document cleaning, information filtering, and other applications.