Fast algorithms for the unit cost editing distance between trees
Journal of Algorithms
Approximately matching context-free languages
Information Processing Letters
Extensible markup language (XML) part 2: linking
World Wide Web Journal - Special issue on XML: principles, tools, and techniques
XTRACT: a system for extracting document type descriptors from XML documents
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
New algorithm for ordered tree-to-tree correction problem
Journal of Algorithms
The XML benchmark project
Information Systems - Special issue on web data integration
Automating XML documents transformations: a conceptual modelling based approach
APCCM '04 Proceedings of the first Asian-Pacific conference on Conceptual modelling - Volume 31
Automatic web news extraction using tree edit distance
Proceedings of the 13th international conference on World Wide Web
Finding an optimum edit script between an XML document and a DTD
Proceedings of the 2005 ACM symposium on Applied computing
Approximate XML document matching
Proceedings of the 2005 ACM symposium on Applied computing
Xdiff+: a visualization system for XML documents and Schemata
Proceedings of the 46th Annual Southeast Regional Conference on XX
Clustering XML documents based on structural similarity
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Efficient schema extraction from a large collection of XML documents
Proceedings of the 49th Annual Southeast Regional Conference
Hi-index | 0.00 |
XML has become the standard format for web publishing and data exchange on the Internet. Much research has been done to provide efficient access to relevant information that is ubiquitous on the Web. In this paper, we present an algorithm to find a sequence of top-down edit operations with minimum cost that transforms an XML document such that it conforms to a schema. The minimum cost is based on the tree edit distance with top-down edit operations. It is shown that the algorithm runs in O(p × log p × n), where p is the size of the schema(grammar) and n is the size of the XML document(tree). Experimental studies have also shown that the running time of our algorithm is linear with respect to the size of the XML document when normalized regular hedge grammar is used to specify a schema.