Computing edit distances between an XML document and a schema and its application in document classification

  • Authors:
  • Guangming Xing;Chaitanya R. Malla;Zhonghang Xia;Snigdha Dantala Venkata

  • Affiliations:
  • Western Kentucky University, Bowling Green, KY;Western Kentucky University, Bowling Green, KY;Western Kentucky University, Bowling Green, KY;Western Kentucky University, Bowling Green, KY

  • Venue:
  • Proceedings of the 2006 ACM symposium on Applied computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present an algorithm to find a sequence of top-down edit operations with minimum cost that transforms an XML document such that it conforms to a schema. It is shown that the algorithm runs in O(p x log p x n), where p is the size of the schema(grammar) and n is the size of the XML document (tree). We have also shown that edit distance with restricted top-down edit operations can be computed the same way.We will also show how to use the edit distances in document classification. Experimental studies have shown that our methods are effective in structure-oriented classification for both real and synthesized data sets.