Fast approximate matching between XML documents and schemata

Authors:
Guangming Xing
Affiliations:
Department of Computer Science, Western Kentucky University, Bowling Green, KY
Venue:
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Year:
2006

Citing 11
Cited 3

Fast algorithms for the unit cost editing distance between trees

Journal of Algorithms
Approximately matching context-free languages

Information Processing Letters
Extensible markup language (XML) part 2: linking

World Wide Web Journal - Special issue on XML: principles, tools, and techniques
XTRACT: a system for extracting document type descriptors from XML documents

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
New algorithm for ordered tree-to-tree correction problem

Journal of Algorithms
The XML benchmark project

The XML benchmark project
A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications

Information Systems - Special issue on web data integration
Automating XML documents transformations: a conceptual modelling based approach

APCCM '04 Proceedings of the first Asian-Pacific conference on Conceptual modelling - Volume 31
Automatic web news extraction using tree edit distance

Proceedings of the 13th international conference on World Wide Web
Finding an optimum edit script between an XML document and a DTD

Proceedings of the 2005 ACM symposium on Applied computing
Approximate XML document matching

Proceedings of the 2005 ACM symposium on Applied computing

Xdiff+: a visualization system for XML documents and Schemata

Proceedings of the 46th Annual Southeast Regional Conference on XX
Clustering XML documents based on structural similarity

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Efficient schema extraction from a large collection of XML documents

Proceedings of the 49th Annual Southeast Regional Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

XML has become the standard format for web publishing and data exchange on the Internet. Much research has been done to provide efficient access to relevant information that is ubiquitous on the Web. In this paper, we present an algorithm to find a sequence of top-down edit operations with minimum cost that transforms an XML document such that it conforms to a schema. The minimum cost is based on the tree edit distance with top-down edit operations. It is shown that the algorithm runs in O(p × log p × n), where p is the size of the schema(grammar) and n is the size of the XML document(tree). Experimental studies have also shown that the running time of our algorithm is linear with respect to the size of the XML document when normalized regular hedge grammar is used to specify a schema.